Chemical Information Sources/SIRCh/Cheminformatics/What is Cheminformatics?

SIRCh: Selected Internet Resources for Chemistry

Variously known as chemoinformatics, chemical informatics, or even chemiinformatics, cheminformatics is the application of computer technology to chemistry in all of its manifestations. Much of the current use of cheminformatics techniques is in the drug industry. Indeed, one definition of cheminformatics is "the mixing of information resources to transform data into information and information into knowledge, for the intended purpose of making decisions faster in the arena of drug lead identification and optimization." Now cheminformatics is being applied to problems across the full range of chemistry.

Cheminformaticians often work with massive amounts of data. They construct information systems that help chemists make sense of the data, often attempting to accurately predict the properties of chemical substances from a sample of data. Thus, through the application of information technology, cheminformatics helps chemists organize and analyze known scientific data to assist in the development of novel compounds, materials, and processes. People who work in cheminformatics may concentrate on molecular modeling, chemical structure coding and searching, chemical data visualization, or a number of other areas of specialization. Indeed, the various computer graphics codes for chemical structures that let us both view and search chemical structures via computer were developed by cheminformaticians.

Methods and tools used in cheminformatics include:

Quantitative Structure/Activity or Quantitative Structure/Property Relationships (QSAR, QSPR)
Genetic Algorithms
Statistical Tools (e.g., recursive pairing)
Data Analysis Tools
Visualization Techniques
Chemically-Aware Web Language (CML).

A sound knowledge of chemistry and excellent facility in computer science are required to be an effective practitioner in the cheminformatics field. Chemical and pharmaceutical companies are in great need of people with such skills. The curriculum for the graduate programs in Cheminformatics in the Indiana University School of Informatics educates students in the following major aspects of chemical informatics:

Information Acquisition: Methods used for generating and collecting data empirically (experimentation) or from theory (molecular simulation)
Information Management: Storage and retrieval of information
Information Use: Data analysis, correlation, and application to problems in the chemical and biochemical sciences.

Information Acquisition

Information acquisition is highly dependent on the computer today. With the integration of modern sensors into chemical instrumentation, the volume of data that can be generated is enormous. Future instrumentation will incorporate information from existing chemical databases, employ modeling techniques, and analyze experimental data as they are generated. Such "smart instruments" will significantly improve the ability of the user to make intelligent decisions about the course of an experiment while the data are being collected and analyzed.

There now exist two complementary pathways for generating and collecting information in the chemical sciences: experimentation and computer simulation. Traditionally, the gathering of data from experiments was done manually, but with the development of computers small enough to be purchased by individual laboratories, the phrase "computers in chemistry" arose to describe their use. Several decades ago this expression meant interfacing a computer to an experiment like a spectrometer or a chromatograph and collecting the data in real time for storage and later manipulation. While this is still being done with microprocessors built into the instruments themselves, a more encompassing label for the wide range of chemical activities involving computers is computational chemistry.

Computational chemistry seeks to predict quantitatively molecular and biomolecular structures, properties, and reactivity by computational methods alone. It uses modern chemical theory to predict the speed of unknown reactions and the synthetic sequences by which complex new molecules can be made most efficiently. Computational chemistry allows chemists to explore how things work at the atomic and molecular levels and to draw conclusions that are impossible to reach by experimentation alone. Thus, computational chemistry supplements experimentally derived data.

One aspect of computational chemistry is molecular modeling. Molecular modeling involves the investigation of three-dimensional molecular structures using classical and quantum mechanical methods assisted by computer graphics. Other molecular modeling techniques include quantitative structure-property relationships, which finds applications in structure-based drug design, similarity searching, and molecular shape prediction. Molecular modeling techniques are utilized extensively in pharmaceutical research, especially to predict pharmacophores--the structural features of molecules required for particular biological activities. Molecular modeling is now used routinely to generate data concerning energetics, dynamics and other information at the molecular scale that is not amenable to experimentation.

Recent advances in combinatorial synthesis and high throughput screening technologies now allow for preparation and analysis of hundreds of thousands of or even millions of molecules in a very short period of time. Combinatorial chemistry techniques grew out of several disciplines, including organic, medicinal, and physical chemistry, engineering and robotics, computational chemistry, informatics, and screening technology. Robotics as used in combinatorial chemistry provides the drug industry a powerful tool with which to screen millions of potential compounds in a fraction of the time it would have taken to evaluate even a few dozen compounds a decade ago. Now widely employed in the pharmaceutical area, combinatorial chemistry has also begun to find applications in materials science. Because so much information is being generated and collected from combinatorial technologies, there is a concomitant problem associated with storing and retrieving those data. That problem is now being addressed by those skilled in cheminformatics.

Information Management

Many of the applications for storing and retrieving chemical data have grown out of the rapid developments in chemical structure coding and searching. The advances in structure-based applications have led to integrated chemical information systems--more and more of which have Web interfaces--and to specialized applications such as Laboratory Information Management Systems (LIMS). The ability to search large secondary databases such as Chemical Abstracts or Medline easily and precisely and to move seamlessly back and forth between the original primary journal literature and the abstracting and indexing databases is one of the truly great achievements of modern cheminformatics research.

Chemists have developed their own communication system (chemical nomenclature and structure systems) that adds a unique dimension to informatics. There is a confluence of activities in cheminformatics that is centered on the chemical structure (both 2-D and 3-D depictions). Two-dimensional chemical structural databases have evolved from traditional chemical structure diagrams into structure searching and substructure searching systems. In the late 1980s, attention turned to 3-D structure searching and representations of chemical structures in three dimensions. Techniques for the full description of the conformational space of flexible molecules and similarity searching techniques have also been developed. These are now being incorporated into chemical information storage and retrieval systems.

Information Use

The computer has enabled chemists to analyze and correlate data from massive chemical and biochemical databanks, and when coupled with chemical visualization and modeling techniques, it is revolutionizing chemical research. Informatics techniques help create an integrated information environment in which all aspects of chemical research and development can be dealt with in a unified system. Not only can chemical structures be used as search keys in such systems, but also unknown properties and spectra can be predicted using cheminformatics tools and techniques that draw on the existing knowledge base of chemistry. Data mining has emerged as a significant factor in the reassessment of data collected over time in an organization. Chemists can now access decades of raw data stored in disparate formats and obtain useful results to build on the research that has taken place in past years. Tying together through Web Services many disparate sources of chemical and life sciences data and information into a usable and useful whole is one of the main activities of the CICC (Chemical Informatics and Cyberinfrastructure Collaboratory) at Indiana University.

For further reading:

Brown, Frank. "Chemoinformatics - a ten year update." (Editorial opinion) CURRENT OPINION IN DRUG DISCOVERY & DEVELOPMENT May 2005, 8(3), 298-302.

Engel, Thomas. "Basic overview of chemoinformatics." Journal of Chemical Information and Modeling 2006, 46(6), 2267-2277. DOI: 10.1021/ci600234z

Gasteiger, Johann. "The central role of chemoinformatics." CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS May 26, 2006, 82(1-2), 200-209 (Special Issue).

Chen, William Lingran. "Chemoinformatics: Past, Present, and Future." J. Chem. Inf. Model. 2006, 46(6), 2230-2255. DOI: 10.1021/ci060016u