Structural Biochemistry/Proteins/Protein sequence determination techniques


The amino acid sequence of a protein is a valuable source of insight into its function, structure, and history. 1. The sequence of a protein can be compared with other known sequences to decide whether significant similarities exist. Using computers, a search for kinship between a new sequenced protein and millions of previous previously sequenced ones only takes seconds. If the newly isolated protein is a member of an established class of protein, we can infer information about the protein's structure and function. 2. Comparison of sequences of the same protein between different species teaches about evolutionary pathways. Genealogical relationships between species can be inferred based on differences in the sequences between their proteins. Assuming the mutation rate of proteins is constant, the analysis of sequences of different proteins from different species can provide information when these two evolutionary lines diverged. For example, comparison of serum albumins found in primates indicate that humans and African apes diverged 5 million years ago instead of 30 million years ago. 3. Amino acids sequences can be searched for the presence of internal repeats. Such repeats reveal the history of the protein, and many proteins have arisen from duplication of primordial genes followed by diversification. 4. Many proteins contain amino acid sequences that serve as signals for their destinations or controlling their processes 5. Sequences provide a basis for preparing antibodies specific for a protein of interest. Parts of an amino acid sequence will elicit an antibody when injected into a mouse or rabbit. These specific antibodies are useful for determining the amount of proteins in the blood. 6. Amino acid sequences are valuable for making DNA probes used for encoding its proteins. By knowing the primary structure, it permits the use of reverse genetics. DNA sequences that correspond to part of an amino acid sequence can be constructed on the basis of genetic code. These DNA sequences can be used as probes to isolate the gene encoding the protein so that the entire sequence can be determined. The gene in turn can provide information about the physiological regulation of the protein.

Determining the amino acid sequences

1. Hydrolysis:
The peptide is heated in 6M hydrochloric acid (HCl) at 110o C for 24 hours. This procedure is required in order to hydrolyze peptide chain into its amino acid.

2. Separation
Amino acids from the peptide is identified by eluting the mixture with buffers of increasing pH in an ion-exchange chromatography column on a sulfonated polystyrene. The volume of buffer used can be correlated to a specific type of amino acid. The most acidic side chain amino acid will emerge first, while the most basic side chain amino acid will emerge last. The amount of each amino acid (one, two, or three residues of a same type)can be determined based on the absorbances.

3. Quantitation
Amino acids from a peptide are quantified by reacting them with ninhydrin, which is used to detect a microgram of an amino acid. Most amino acids will give an intense blue color, except proline which gives a yellow color due to the secondary amino group in its structure. Furthermore, to detect a nanogram of amino acid, fluorescamine, which reacts with the alpha-amino group, can be used, yielding a highly fluorescent product. The concentration of amino acid is proportional to either the optical absorbance of the sample treated with ninhydrin or the fluorescence emitted by the sample treated with fluorescamine.

This method tells you only the composition of the proteins, not the sequence of the amino acids. Edman degradation is the one that provides the order of the sequence of amino acids in a protein.

Determination of composition of amino acid is then followed by determination of amino acid sequence. It can be done using 2 complimentary methods:

1.Edman Degradation

The reaction for Edman Degradation occurs through the use of phenylisothiocyanate and in acidic conditions
  1. This method is done by cleaving the amino acid one by one from the amino terminal. The chemical used for this process is Phenyl isothiocyanate. Amino acids that react with this chemical will form phenylthiohydantoin (PTH)-amino acid (e.g. PTH-glycine). Under mildly acidic conditions,( PHT)-one termial residue is released. This compound is then identified using chromatographic procedures. Edman degradation is quite simple to perform (the sequencer is automated), but this method is not effective for long peptides (more than 50 residues) because it takes an hour to perform one cycle of degradation.
    One example of chromatographic procedure is high-pressure liquid chromatography. In this procedure, the PTH-amino acid is separated into its components such that the amino acid's identity can be found by its absorbance and elution time.
    Edman degradation is the most efficient technique used to sequence proteins without breaking the bonds between residues. Also, the development of automated sequencers has allowed for much quicker and efficient polypeptide sequencing.

Controlled cleavage using various chemical and enzyme

  1. For listing of special enzyme and chemicals, please see [1]
    Some chemicals and enzymes are known to cleave peptides at certain locations (e.g.: the amino or carboxyl end of certain amino acid). Using these chemicals and enzymes, peptides can be cut into fragments with sizes that can be analyzed using Edman Degradation. By combining 2 or more chemicals and enzymes that cleave peptides at different position, the resulting small fragments (whose amino acid sequence has been identified through Edman Degradation) can be put together in a manner similar to putting together a jigsaw puzzle.

Limitations of Edman Degradation

Even peptides less than 50 amino acids in length can become problematic when performing Edman degradation. One example of this is when the N-terminal of an amino acid is in an unfavorable position such as the inside of a protein, or when it is sequestered. In addition, Edman degradation may fail due to post-translational modifications of proteins such as glycosidation, acetylation, phosphorylation, and fatty acid addition. For example, the formylation of an amino acid will prevent reaction with phenyl isothiohydantoin. In particular, disulfide bridge between two cysteins (Cys) can complicate the sequencing by sequestering the N-terminus, or sterically hindering the cyclization of the phenyl thiourea intermediate. This could be modified by reducing the disulfides (with beta mercaptoethanol or DDT) and oxidizing the cysteine sidechains to their corresponding sulfonic acids with performic acid to prevent disulfide formation, and then performing the sequence as usual.

2. Mass Spectrometry

Mass Spectroscopy is another technique that can be used to determine protein sequence, but it can only be identified with the parent protein with the fragments cleaved by specific enzymes. The mass of ionized proteins can be obtained by measuring the time of flight of those ions as they are triggered by a laser beam and travel through the flight tube to the detector. The lighter massive ions will travel faster and arrive at the detector first due to Newton's second law (F = ma). The mass spectrum recorded is then analyzed and compared against a database of sequenced proteins. The sequences of protein fragments, therefore, can be determined in detail if the process is repeated with different enzymes; the fragments become smaller, and the overlapping fragments can thus be used to establish the order.

1-fluoro-2,4-dinitrobenzene is used in Sanger's reaction to determine amino acid sequence
Ninhydrin is used to determine the presence of amino acid after hydrolysis

Limitations of Mass Spectrometry

One limitation of using Mass Spectrometry as a means to determine protein sequence is in situations in which more than one amino acid has a specific mass. For example, since Leucine and Isoleucine have the same molecular weight (131.17 g/mol), identical mass spectrometry data would be obtained for these two amino acids and in this case render the method ineffective.


Berg, Jeremy M. Biochemistry. 7th ed. (79-80)