Chemical Abstracts Service's Registry File is the largest single collection of data that can be used to identify a chemical substance. Each unique chemical substance is assigned a Registry Number, which CAS uses in preference to a chemical name to index documents in the CA or CAPlus Files. Much of the descriptive information about a compound (its molecular formula, variant names for the substance, as well as much detailed information about its makeup, including the structure) is found in the Registry File. Furthermore, in recent years, actual data (experimental or calculated data) have been added to the file, making it much more like a huge handbook. The Registry Number serves as the unique identifier of the record. The Registry File includes a number of search techniques that are built on the chemical name and other fields included in the Registry File records.
In the printed CA, there is no Registry Number Index. Instead, the "Chemical Substance Index" ("CSI") links the preferred CA Index Name for the substance to the documents that have information on it. However, names for classes of compounds are indexed in the "General Subject Index". Also, in the printed Chemical Abstracts, supplemental access to the printed product is found in the "Formula Indexes". The "CSI" has dictated much of the indexing policy for supplemental terms used to describe the role of the chemical substance in the document. The broad indexing terms found in the CAS Roles in the CA File and the Standard Subject Divisions in the printed CSI can be of considerable use in retrieving the precise information of interest about a compound on which much has been written. NOTE: As of January 1, 2010, Chemical Abstracts is no longer available in print.
Molecular formula searching in CA is based on the Hill Formula system (described below). The concept of the dot-disconnected formula for salts, addition compounds, and mixtures is important in both the database and the printed "Molecular Formula Index" to Chemical Abstracts.
A search for information on a single chemical substance may start with the name of the substance, its molecular formula, or various other words or codes that can be associated with it. (See: Locating All CA File References Citing a Chemical Substance and CAS REGISTRY: Finding CAS Registry Numbers) In this chapter, we will encounter various coding systems that have been applied to the retrieval of chemical substances from both printed and computer-based sources. The main database to search for such information is the CAS Registry File, which now has in excess of 135,000,000 records for chemical substances (including biosequences). Many of the entries in the Registry File are for sequences of biological macromolecules. The bulk of the remaining small molecule entries are for organic compounds, either simple organics (esters, steroids, heterocycles, stereoisomers, etc.) or such things as mixtures, polymers, and organic salts. Just over 10% of the file is comprised of inorganic compounds.
Mastery of formal chemical nomenclature is a skill possessed by few chemists nowadays. The International Union of Pure and Applied Chemistry (IUPAC) determines the recommended practices for assigning official names to chemical substances. With a knowledge of the IUPAC nomenclature rules, a chemist can visualize and depict the correct structure of even complex chemical compounds. However, creating such a name from scratch is another matter. An excellent Web guide to chemical nomenclature is Charles H. Davis's Chemical Nomenclature Lite. Fox and Powell's classic work, Nomenclature of Organic Compounds: Principles and Practice, appeared in a 2nd edition in 2001. For other types of substances (and nomenclature in specific areas of chemistry), see the so-called "color books" of the IUPAC. The Enzyme Commission assigns EC numbers for enzymes that are very useful in computer searching.
Until late 2006, Chemical Abstracts Service (CAS) made major changes to their chemical nomenclature policies only at the boundaries of the five-year collective index periods. They have now abandoned that policy, preferring to make changes to CA Index Names as needed to ensure that the CAS Registry System has the most current, usable information. The names will now conform more closely with the names that chemists typically use. Among the nomenclature improvements to be implemented are more uniformly cited locants, reduction in the number of stereoparent names, and the elimination of nearly 3,000 obscure stereoparents. Unexpressed amides also will be disregarded.
Substance Searching Using Chemical Abstracts Service Registry NumbersEdit
One very effective method of retrieving chemical substance information from a reference source is to utilize the Chemical Abstracts Service REGISTRY NUMBER for the substance. The Registry Number is a unique number assigned to each substance indexed by CAS. The CAS RN is a number of the format Y-XX-X, where Y can be from two to six digits, and X is one digit, for example, 494-12-2. (Recently, the RN has been expanded to 10 digits.) The Registry Number is found in many databases and increasingly as an index to printed reference works. The Registry File started in 1965 with new substances that were encounered from that date forward. Older substances have now been entered into the system for records that date from 1907-65. Now that CAS has finished this task, all compounds discovered post-1907 should be in the database. For compounds discovered prior to 1907, it is wise to search the Beilstein and Gmelin databases on Reaxys, which have coverage back to the 18th century.
The Registry Number appears in the indexing of CA and CAplus File records in preference to the formal name of the compound. In volume 106 of Chemical Abstracts is found abstract number 195826 for the following article:
Grieco, Paul A.; Bahsas, Ali. Reactions of allylstannanes with in situ generated immonium salts in protic solvent: a facile aminomethano destannylation process. J. Org. Chem. (1987), 52(7), 1378-80. CODEN: JOCEAH ISSN:0022-3263. CAN 106:195826 AN 1987:195826 CAPLUS
The indexing below includes part of the Registry Numbers for compounds discussed in the article.
CAS Registry Numbers are assigned to organic and inorganic substances, metals, alloys, minerals, polymers, coordination compounds, elements, isotopes, peptides, enzymes, biomolecular sequences, and nuclear particles. However, the mere mention of a compound in a document is not enough to insure that the indexers at Chemical Abstracts Service will tie a CAS RN to the record for that document. To get an entry in the CA indexes, there must be something new reported about the substance. It may be a new method of preparation, a new source for the substance, a new reaction, a new kinetic or mechanistic study, new chemical or physical properties, a new method of analysis, a new use or application, or a new biological effect. Chemical reactants and the resulting products are routinely indexed, but reagents are not indexed unless there is a new preparation of the reagent itself or a novel use of a standard reagent.
In 2008, CAS entered into a cooperative venture with Wikipedia to provide CAS Registry Numbers for chemical substances of widespread general interest. The result is Common Chemistry, a Web resource where approximately 7,900 substances can be searched without cost by chemical name or CAS Registry Number. Entering the CAS RN for Isatin, 91-56-5, brings up a record with the CAS Preferred Name, 1H-Indole-2,3-dione, 18 other names for Isatin, the molecular formula, a 2D structural drawing, and the link to the Wikpedia article on Isatin.
The "Index Guide" and Chemical Name Searching in the Printed Chemical Substance IndexesEdit
Just as the "Index Guide" controls the vocabulary that must be used in the Chemical Abstracts "General Subject Index," it also provides the correct name to use in searching the CA "Chemical Substance Index". For example, a check of the "Index Guide" for "Flavan" finds the following:
Flavan See 2H-1-Benzopyran, 3,4-dihydro-2-phenyl- [494-12-2]
In alphabetizing chemical substance names in the index, locant numbers, stereo designators, etc. are ignored. Thus, we must look in the "B" section of the printed CA "Chemical Substance Index" for "Benzopyran" in order to find index entries on the compound. Note that the CAS Index Name for Flavan is inverted, with the name of the so-called HEADING PARENT listed first. This keeps structurally related compounds in the same area of the index. The basic Heading Parent compound is listed first, followed by derivatives and other structurally related compounds. The entries in the "Chemical Substance Index" include the TEXT MODIFICATIONS (other subject words) that give more information about the documents that are indexed.
From 2007, CAS no longer categorizes information by collective index periods, so the new CA index names no longer have a "CI" label, e.g., (6CI, 7CI, 8CI, 9CI), etc.
Qualified Substances in CAS Files and IndexesEdit
If not much has been written about the substance during the indexing period, all of the indexed information is found in a single alphabetical sequence under the Index Name in the printed "Chemical Substance Index". However, when the index entries become voluminous, CAS divides them into Standard Subject Divisions. The compounds so treated are referred to as QUALIFIED SUBSTANCES. Originally seven qualifiers were used, but two additional terms (formation and processes) were added in 1994, and one phrase (uses and miscellaneous) was subsequently split apart. The qualifiers are:
- ANALYTICAL STUDY (ANST) - for methodology of detection or determination of the substance, or its analysis; also for separation if the intent is analytical.
- BIOLOGICAL STUDY (BIOL) - for biochemical uses and for processes, properties, occurrence, and formation in biological systems (including nonfossil by-products of living matter, food, etc.). Studies on the herbicidal, pesticidal, and pharmaceutical use of the material are also placed in this subdivision.
- FORMATION, NONPREPARATIVE (FORM) - for the incidental formation of the substance in a nonpreparative study (from v. 121 onward).
- MISCELLANEOUS (MSC) - studies not otherwise classifiable.
- OCCURRENCE (OCCU) - for natural occurrence (in other than biological systems).
- PREPARATION (PREP) - for synthesis, manufacture, incidental formation (other than biochemical), recovery, separation, and purification.
- PROCESS (PROC) - for nonreactive treatment of the substance, nonpreparative removal of the substance, and complex treatments of the substance (from v. 121 onward).
- PROPERTIES (PRP) - for physical and chemical properties and related non-reaction processes.
- REACTIONS (RACT) - for chemical changes that lead to products differing chemically from the starting material, including nuclear interactions (other than simple scattering), corrosion, neutralization, enolization, isomerization, and tautomerism.
- USES (USES) - for applications (other than biochemical), removal (in purification procedures), industrial processing.
ROLES are CAS indexing terms assigned to every indexed substance and to controlled index terms for classes of compounds. The use of roles began to be appplied to the new online CA File records with v. 121 (July 1994). They were then applied retrospectively to all CA File records by means of a computer algorithm. Since there are over 60 specific roles and 9 broad super roles, they substantially expand the indexing terms that were used prior to their introduction. The role terms give a more precise link to the substance. For example, it is now possible to specify not only that you want the preparation of the substance, but also that the preparation be a synthetic preparation, as opposed to industrial manufacture. In the past, there was no distinction made in the use of the term "Preparation" in such cases.
Searching the Registry File with a Chemical NameEdit
The Registry File is the largest single source of chemical names in existence. It can be searched on the STN command-language system by a trade or common name for a substance (CN), by its CAS Index Name (CN) in inverted order, or by fragments of the CAS Index Name (CNS field). (See: Tips for Chemical Name Searching) Just as we had a Basic Index that is formed from subject words in a bibliographic database, there is also a basic index for the Registry File when searched on STN. The BASIC INDEX of the Registry File includes both chemical name fragments and molecular formula fragments. It may be necessary to follow certain protocols for special characters in order to search for a chemical name. Greek characters, for example, are spelled out in their entirety with a period before and after the Greek part of the name. An example of such a chemical name search in SciFinder is below. Note that in the SciFinder system, the search will work with or without the periods around the "alpha," but in STN command-language searching, the dots are mandatory.
Note that in SciFinder, you should not invert the name when searching a CA Index Name. For example, entering Benzene, 1,4-dibromo will not work, but searching 1,4-dibromobenzene will.
Searching the Registry File and Printed CA Indexes with a Molecular Formula: The Hill SystemEdit
The system most commonly used today for arranging molecular formulas in indexes is the HILL SYSTEM. The Hill System covers both organic and inorganic compounds according to the following rules:
1. Sum individually all like atoms within the molecule.
2. If carbon is present, place it and the total number of C's first in the formula.
3. If both carbon and hydrogen are present, place hydrogen and the total number of H's second. Note that if carbon is not present, rule 4 applies to the substance, and the H is placed in its regular position in the alphabet.
4. All other atoms in the molecule are arranged alphabetically. That means that for inorganic substances without carbon, the arrangement is alphabetical.
Within the index itself, the numbers of elements come into play. Here is an example of compounds arranged for a Hill System Index:
|Al6 Ca5 O14||C5 H8 O2
|B2 O3||C8 H5 N O2
||C15 H24 N2
||C22 H24 F N3 O2
||Ca O3 Ti
|C H Cl3
|C H N O
||H2 O4 S
||O3 Pb Rb2
|C2 H4 Br Cl
||O5 P14 Zn7
|C2 H5 Al Br2
Note that in the Registry File (including the SciFinder approach), the formulas may be searched with or without spaces between the element symbols. They are put here for clarity. The Hill System gives rise to some formulas that are quite different from those a chemist is used to seeing, e.g., H2O4S for sulfuric acid or BrH for hydrobromic acid.
The printed CA "Formula Indexes" do not have entries for the 600 or so qualified substances that have lots of information written about them. Thus, we find in the Chemical Abstracts "Formula Index" from the 10th Collective Index period (1977-81):
See Chemical Substance Index
sodium salt [3486-31-5], 90: 6180p; 91: 157670v; 94: 209034z
This tells us that the printed CA "Chemical Substance Index" must be used for detailed information on isatin itself, but it gives direct information that three documents dealt with the sodium salt of isatin during the period. When a sustance would have more than 20 entries in a 6-month volume index or more than 50 entries in the 5-year collective "Formula Indexes," a "See" reference is made to the name of the substance in the "Chemical Substance Index". We find in the "Formula Index" the abstract numbers for the sodium salt of isatin since there were relatively few documents written about that compound during the 10th Collective Index period.
A chemical formula in the Hill System may have more than one substance with that formula. For a given formula, isomers are arranged alphabetically by the CAS Index Name.
In the online molecular formula index of the Registry File (/MF), salts, addition compounds, and mixtures have the molecular formulas for the components arranged separately, with ratios for salts and addition compounds specified when known. If the ratios are unknown, a lower case "x" before the second formula or subsequent formulas is used, e.g.,
C15 H24 N2 . 2 Cl H
C22 H24 F N3 O2 .x H2 O4 S
These are examples of the so-called DOT-DISCONNECTED FORMULAS. (See: Tips for Molecular Formula Searching)
Molecular Formulas of Types of Compounds in CA/STNEdit
Simple salts such as sodium chloride are treated as any other Hill Formula: ClNa.
1. Metal Salts of Complex Organic or Organometallic Acids
In general these substances have the molecular formula of the cation followed by the dot disconnect symbol (the period) and a multiplier times the molecular formula of the anion.
For metal salts of organic acids, the metal replaces one or more hydrogens attached to N, O, P, As, Se, or Te in an organic substance. The CAS structuring conventions treat these substances in the following manner:
- The organic portion is treated as a neutral molecule, including the acidic hydrogen atoms.
- The metal is viewed as a separate, unattached fragment.
- The ratio between the organic acid and the metal atom is expressed. (If unknown, the ratio is expressed as "x".)
The multiplier for the organic acid is always 1. For the metal, it indicates the oxidation state as a fraction, e.g., C7 H6 O2 . 1/2 Cu
1, 2, 3-Propanetricarboxylic acid, 2-hydroxy-, trisodium salt
A search of SciFinder for the molecular formula yielded ten answers at the time of the search, among them:
- Unknown ratio: C6 H8 O7 . x Na
- Mixed metal salt: C6 H8 O7 . Ca . Na
- Metal salt of an alcohol: C6 H6 O2 . 1/2 Ba
- Metal salt of a radical ion: C10 H8 . Na
- Metal salts of two or more different acids have the hydrogens removed, and bonds are formed from the heteroatoms of the acids to the metal.
- Metal salts of dithiocarbamates (and Se or Te analogs) are represented as N-C(=Q)-Q, where Q = S or Se.
- Likewise, metal salts of dithiophosphates are represented as R2P(=Q)-Q, where R = halide, halogenoid, or carbon-containing substituent.
- Salts of coordination compounds, e.g., C7 H4 Cu O3 and C18 H18 O8 Zn.
Organometallic compounds in the Registry File are substances which have a carbon atom directly bonded to a metal atom, e.g., Phenyl Lithium: C6 H5 Li. Note, however, that carbonium ions and carbanions are generally found as dot-disconnects in the Registry File.
Coordination compounds in the Registry File are substances in which an atom or group of atoms is bound to a central metal atom by a pair of electrons supplied by the coordinate group and not by the central metal atom, e.g., metallocenes. These substances have the Class Identifier code CCS in the Registry File records.
Polymers are indicated with the molecular formula of the repeating unit(s) in parentheses to which is appended an "x". The "x" indicates a repeating unit. For example, the molecular formula for 1,3-Butadiene is (C4H6)x. A search for a polymer by molecular formula may retrieve variant forms of the substance, because the syndiotactic, isotactic, graft or co-polymer will all have separate Registry Numbers.
Molecular Formulas in The Basic Index of the Registry FileEdit
The Registry File's Basic Index contains chemical name fragments and molecular formula fragments (including molecular formulas for individual components of multi-component substances and single component substances). Formula fragments searched in the Basic Index must be entered without spaces.
In command-driven searching, it is possible to search for various information about the elements comprising a chemical substance, such as:
- Element Symbol, indicating the presence of an element (/ELS), e.g., => S B/ELS and H/ELS
- Element Count, to specify the number of unique elements in a component or substance (/ELC or /ELC.SUB)
- Element Formula, the molecular formula of components without the numbers that depict the ratios (/ELF), e.g., => S AL CO LA O/ELF
- Periodic Group, the column and row designations for elements, e.g., => S B6/PC or => S LNTH/PG
- Material Composition, when looking for alloys
There are many more options for such searching on the STN command-language system.
Ring System Data and Ring IndexesEdit
The Ring Identifier information (RID) lets you search a database for everything from the number of rings in a substance to the Ring Formula (minus hydrogens). The Registry File now has much information about rings that can be searched online, such as the Elemental Sequence for the Smallest Ring (/ESS), the number of rings in the ring system (/NRRS), etc. These search techniques can be valuable in refining a substance search in the Registry File. See the REGISTRY Database Summary Sheet for more options.
The Ring Systems Handbook provides an easy way to find the Heading Parent name for ring compounds. This name can then be used in the printed CA "Chemical Substance Index" or, for an online search, either the name or the Registry Number can be used to retrieve the Registry File record. It is important to know that the compound found in the Ring Systems Handbook may not actually exist. That is, there may be no information in the CA File on the substance. When a new ring system is identified, the substituents are stripped off, and a new ring system entry placed in the RSH.
The access to the entries in the Ring Systems Handbook is by name or ring analysis (and then by molecular formula of the rings making up the compound, ignoring hydrogens). The main part of the set is arranged by the number of rings comprising the compounds and the individual sizes of the smallest set of smallest rings. Thus, the number of component rings, the sizes of those rings, and the elements comprising them are enough information to find a ring compound. A section in the main body of the work might be labeled:
2 RINGS: 5,6 C4N-C6
We would find in the section an entry for 1H-Indole [120-72-9]
H C . : . . N . C: .C. . C . : : . : : C: C.........C : . :C.
with the molecular formula C8H7N and a 2-dimensional structural drawing of the molecule.
It would not be too difficult then to assign the proper Chemical Abstracts Index name for isatin: 1H-Indole-2,3-dione
Chemical Abstracts incudes an "Index of Ring Systems" with each Formula Index, beginning with the 7th Collective Index period (1962-66).
Compound Class IdentifiersEdit
There are a number of other indexes that can be used in an online search of the Registry File, e.g., Compound Class Identifiers (/CI).
|Incompletely Defined Substance||IDS|
|Manually Registered Substance||MAN|
An example of the use of the CI field in command-level searching is:
Such searches are of use in combination with other Registry File searches in order to narrow an answer set. See the REGISTRY Database Summary Sheet for additional possibilities.
NLM's Online Chemical Dictionary Files, PubChem and ChemSpiderEdit
Databases such as the Registry File are referred to as ONLINE CHEMICAL DICTIONARY FILES. They exist to help you identify substances, to gather like substances into a set, and to discover which files on the database vendor's system have information on the substance(s).
In the past there was an online chemical dictionary file from the National Library of Medicine. Although not nearly as large as the Registry File, NLM's CHEMLINE file contained over 1,360,000 records as of mid-1995. Work ceased on the CHEMLINE file in 1998. NLM publishes Supplementary Concept Records (formerly, Supplementary Chemical Records). It was an annual printed compilation for many years that contained all of the compound names used in indexing records in the Medline system. See the record on this page for a summary of the data fields included in the Supplementary Concept Records. Various Medical Subject Heading (MeSH) files are available for download.
A smaller NLM file is ChemIDplus, with nearly 380,000 compounds, over 263,000 of which have structure data. There is also a ChemIDplus Lite version for those who just need to do name or Registry Number searching and do not want to use a plugin or applet. An important feature of the ChemIDplus file is the link to SuperList. SuperList designates a collection of lists of chemical substances maintained by key federal and state government regulatory agencies, as well as by scientific organizations concerned with health and environmental hazards of chemical substances. ChemIDplus provides directory assistance to those lists. Searching the NLM files is considerably cheaper than searching the CAS Registry file.
Unlike CAS, the National Library of Medicine has attempted to group compounds with related substances in their index in a hierarchical fashion. From 1963 through 1995, a chemical was generally "treed" in two places: in one Tree showing its chemical structure and in a second Tree under its function, or pharmacological action. The arrangement of chemical headings in MeSH (Medical Subject Headings) has not changed, but NLM no longer puts all drugs under the functional trees.
The NIH's PubChem is a free database covering over 27 million unique substances. PubChem has numerous search options, including the capability to search by InChi, the IUPAC International Chemical Identifier. PubChem includes substance information, compound structures, and BioActivity data in three primary databases, Pcsubstance, Pccompound, and PCBioAssay, respectively.
The RSC's ChemSpider is also a free database containing around 25 million compounds from 400 data sources.
The easiest way to search in ChemSpider is to use a common name or tradename. For example, benzyl azide is a versatile reaction intermediate. What information can I find about this compound in ChemSpider?
STEP 1 Go to www.chemspider.com. On the home page there is a search box, simply type the name of the compound of interest and click Search. Alternatively, select the Search tab from the top toolbar and choose Simple Search from the drop down menu.
STEP 2 Look at the results. The default record view will give you the structure, SMILES, InChIKey alternative names & synonyms.
Scroll down the record view to see more information. The record view comprises a number of info boxes which may include a number of different tabs indicating the different pieces of information that are available.
In the Associated data sources box for example, those data sources who are commercial vendors from whom you can purchase the chemicals are indicated with a shopping cart. Other sources may include links to biological data, toxicology data, physical properties, spectral data and safety data.
Scroll down the record to view all of the different sections of the page (if they aren’t visible click on the ‘expand’ icon in the section heading to expand them).
There will be info boxes for links to patent information from SureChem and literature links providing access to RSC journals, book and databases. The Search Google Scholar link will enable you to expand a search into the wider scientific literature based on the approved names and synonyms in ChemSpider.
Records may also have a link to reactions in ChemSpider SyntheticPages. You can view the full article in CS|SP at http://cssp.chemspider.com
There is also a link to spectral data. This can be HMNR, CNMR, IR or Mass Spectra. The spectra can be viewed in a Java applet and can also be downloaded.
Beilstein and GmelinEdit
The factual Databases Beilstein and Gmelin are organized a little bit differently. Structure searching is the most appropriate way to find informaiton in these sources on Reaxys. Although both can be queried using chemical name and formula searching, for the inorganic compounds in Gmelin, formula searching is actually the most appropriate approach.
Searching by Name
In both Beilstein and Gmelin, there is a field "Chemical Name (CN)" containing the chemical names of the substances in the databases. Select the field from the datastructure or use it in advanced mode like cn=*searchterm*. Truncation can be used left and right. It is advised to use the list (expand) function to look for different spellings of the same name that might be found from different authors in different publications.
The field "Chemical Name Segment" contains the fragmented pieces of the field CN. Querying for "Indole" using this field retrieves a list of compounds containing the term "Indole" in their chemical name.
While these two fields contain the names and name fragments of registered Substances, the field "All Chemical Names" includes the names of solvents, derivates and other fields with chemical names in addition, and thus allows a broader approach to searches using chemical names.
Searching by Formula
Using molecular formula for searches in Beilstein can be a very powerful option, and there are a few options for such a search.
The field Molecular Formula (MF) contains exact molecular formula for single- and multi-fragment compounds. It is calculated from the chemical structure in Hill order, with no charge or isotope information. For multi-fragment compounds like salts the molecular formulas corresponding to the individual fragments are separated from each other by an asterisk and have normalized stoichiometric multipliers.
For the sodium salt of Isatin the Molecular Formula accordingly is C8H4NO2*Na
Positional isomers can be searched very effectively when the molecular formula is combined with a Lawson Number(LN).
The field Linear Structure Formula (LSF) adds the option to explicitly include charges or isotope labels with the exception of Deuterium and Tritium.
For the above mentioned Isatin salt this would be C8H4NO2(1-)*Na(1+)
The field "Search MF Range" allows searching for derivatives of a certain carbon skeleton or for ranges in the molecular formula. Thus queries like "C(2-4) H(4-8)" or "C8 H7 *" are allowed using this field. Note that it is not possible to use larger or less than signs or symbols. If you want to require more than 3 oxygens to be present in the resulting structures, use "O(4-99)".
For Gmelin searching by molecular formular is the method of choice especially when it comes to inorganic compounds.
Chemical nomenclature is an area of expertise claimed by few chemists today, but there are powerful search capabilities in databases and printed reference works that make use of chemical names, both trivial and formal names. On the other hand, all chemists use molecular formulas, and a system such as the Hill System for arranging molecular formulas in an index provides a useful retrieval mechanism. Chemical Abstracts Service uses the Registry Number to index documents in the CA database. Many tags have been developed to use with the Registry Numbers for more effective searching in the CA databases. An increasingly popular search site is the PubChem database, and the Beilstein and Gmelin databases are useful complements to the others.