Proteomics/Protein Primary Structure/Genetic Code

« Protein Primary Structure
Genetic Code
Sequencing Methods Alternative Splicing

This Section:


Transcription is the process by which messenger RNA (mRNA), is synthesized from information encoded by DNA. This is the first step in gene expression and also an important location of cellular control. By careful regulation, a cell is able to change the expression of DNA to RNA which in turn influences the proteome - increasing levels of expression to increase protein products and decreasing expression to limit products. Transcription proceeds by reading from the 3’ → 5’ end of the DNA source (template strand), adding nucleotides to the growing mRNA strand in the 5’ → 3’ direction. These nucleotides are slightly different in RNA compared to DNA. With RNA, uracil (U) is substituted for thymine (T). The resulting RNA strand will have the same sequence (substituting U for T) as the DNA coding strand and a complementary sequence to the DNA template strand.[1]

Transcription can be divided into three processes:[1] initiation, elongation, and termination.


Simple diagram of transcription initiation. RNAP = RNA polymerase

In bacteria initiation occurs in the cytoplasm when RNA polymerase binds to the DNA at a promoter site, shown in the attached image, and then the associated helicase begins to separate the DNA double helix. Along with RNA polymerase comes a variety of additional molecules that together form the machinery needed to transcribe a gene.

The initiation process in eukaryotes occurs in the nucleus and is more complex due to finer levels of regulation. For starters, there are a battery of different RNA polymerase enzymes present in eukaryotes, of which RNA polymerase II is the one responsible for the transcription of mRNA. There is not a simple promoter sequence for RNA polymerase to bind as there is in bacteria, but a combination of a promoter sequences and various transcription factors is required for binding of RNA polymerase. [1] There are also repressors, which are proteins that bind DNA near a promoter and prevent initiation of transcription. A gene product itself may in fact be a repressor, exhibiting feedback inhibition, where as the concentration of the product builds, further production is slowed. [2]


Simple diagram of transcription elongation

Elongation is the process by which RNA polymerase adds complementary bases to the growing mRNA strand in the 5' → 3' (reading the template strand from 3' → 5'), one at a time until reaching a termination point. The energy for this process is provided by the hydrolysis of the nucleoside triphosphates (NTPs) that are the building blocks for mRNA. For each base added to the growing mRNA chain, one NTP is converted to the corresponding NMP plus inorganic pyrophosphate.[3] As RNA polymerase travels down the template strand, helicase travels ahead of it "unzipping" the DNA double helix as it goes, shown in the image to the right.

During elongation, RNA polymerase alternately works at full speed for approximately 100 bases and then stops. This pattern of starts and stops has led scientists to determine that there are proofreading mechanisms being employed by RNA polymerase. [4] There is also evidence of RNA polymerase "backtracking", or going back over bases that are already paired, again assumed to be part of a proofreading mechanism. [5] In bacteria, there are two such known proofreading mechanisms, pyrophosphorolytic editing is where the incorrect base pair is immediately removed and hydrolytic editing is where RNA polymerase has to backtrack to fix an incorrect pairing. [6] Less is known about eukaryotic proofreading methods.


Simple diagram of transcription termination

Termination is the end of transcription, when an mRNA strand is fully formed. As seen in the image to the right, both the newly formed mRNA and the RNA polymerase complex dissociate from the DNA template. Termination in prokaryotes can follow one of two forms, Rho-independent (intrinsic termination) and Rho-dependent. Rho is protein factor that, in a Rho-dependent termination, functions to bind RNA and hydrolyze ATP, destabilizing the link between the template strand and the newly formed mRNA and releasing the mRNA. In a Rho-independent termination, there is a termination sequence on the template strand, causing RNA polymerase to cease its activity and fall off the template. [6]

In eukaryotes, termination is more complex and not as well understood. Some eukaryotic transcription proceeds well past the termination point and the additional nucleotides are later removed from the transcript, some transcription is mediated by termination factors and some methods of termination are still uncharacterized. [7] After termination in eukaryotes, it is very common for the mRNA to undergo Post Transcriptional Modifications. These include the addition of a 5' cap to ensure the mRNA stability. Addition of a Poly-A tail is also an important addition to the mRNA in eukaryotes and a select few prokaryotes. This functions to protect the mRNA from degradation and facilitates export from the nucleus.

Reverse TranscriptionEdit

Reverse transcription is essentially the copying of RNA information into a DNA form. The enzyme used to accomplish this is reverse transcriptase and can be found in retroviruses such as HIV. A retrovirus uses Reverse transcription to copy its RNA-based genetic material to DNA, which can then be integrated into the host’s genome the virus is infecting. [8] Reverse transcription is not limited to just retroviruses, it is also used by DNA-based viruses, in eukaryotes to propagate retrotransposons and in prokaryotes when synthesizing msDNA (multicopy single-stranded DNA)[9]

The Genetic CodeEdit

The Standard Genetic CodeEdit

The genetic code is the language by which the information coded within a nucleotide sequence is translated into peptides that ultimately form proteins. An RNA chain is divided into a series of three-nucleotide sequences known as codons codons, each of which is associated with an amino acid. The following is the standard genetic code table (though some organisms have different genetic code mappings) showing all possible three-nucleotide combinations and the related amino acid:

RNA codon tableEdit

The is a table for each codon and its corresponding amino acid [10]
2nd base

UUU (Phe/F)Phenylalanine
UUC (Phe/F)Phenylalanine
UUA (Leu/L)Leucine
UUG (Leu/L)Leucine

UCU (Ser/S)Serine
UCC (Ser/S)Serine
UCA (Ser/S)Serine
UCG (Ser/S)Serine

UAU (Tyr/Y)Tyrosine
UAC (Tyr/Y)Tyrosine
UAA Ochre (Stop)
UAG Amber (Stop)

UGU (Cys/C)Cysteine
UGC (Cys/C)Cysteine
UGA Opal (Stop)
UGG (Trp/W)Tryptophan


CUU (Leu/L)Leucine
CUC (Leu/L)Leucine
CUA (Leu/L)Leucine
CUG (Leu/L)Leucine

CCU (Pro/P)Proline
CCC (Pro/P)Proline
CCA (Pro/P)Proline
CCG (Pro/P)Proline

CAU (His/H)Histidine
CAC (His/H)Histidine
CAA (Gln/Q)Glutamine
CAG (Gln/Q)Glutamine

CGU (Arg/R)Arginine
CGC (Arg/R)Arginine
CGA (Arg/R)Arginine
CGG (Arg/R)Arginine


AUU (Ile/I)Isoleucine
AUC (Ile/I)Isoleucine
AUA (Ile/I)Isoleucine
AUG (Met/M)MethionineStart

ACU (Thr/T)Threonine
ACC (Thr/T)Threonine
ACA (Thr/T)Threonine
ACG (Thr/T)Threonine

AAU (Asn/N)Asparagine
AAC (Asn/N)Asparagine
AAA (Lys/K)Lysine
AAG (Lys/K)Lysine

AGU (Ser/S)Serine
AGC (Ser/S)Serine
AGA (Arg/R)Arginine
AGG (Arg/R)Arginine


GUU (Val/V)Valine
GUC (Val/V)Valine
GUA (Val/V)Valine
GUG (Val/V)Valine

GCU (Ala/A)Alanine
GCC (Ala/A)Alanine
GCA (Ala/A)Alanine
GCG (Ala/A)Alanine

GAU (Asp/D)Aspartic acid
GAC (Asp/D)Aspartic acid
GAA (Glu/E)Glutamic acid
GAG (Glu/E)Glutamic acid

GGU (Gly/G)Glycine
GGC (Gly/G)Glycine
GGA (Gly/G)Glycine
GGG (Gly/G)Glycine

Degeneracy of the CodeEdit

As is apparent in the graphic above, there are far more three-nucleotide combinations than there are amino acids, so many nucleotide combinations may code for the same amino acid. This property of the genetic code is called degeneracy. Most degeneracy of the code happens at the third position in the codon, which can be explained by the wobble hypothesis. This theorizes that a single tRNA can bind to multiple different codons due to a weak pairing on the third nucleotide of the codon. In addition, degeneracy works as a simple safeguard against complications arising from genetic mutations, as some errors in replication and/or transcription will in fact code for the same protein as the undamaged DNA, thereby nullifying the effect of the error; these are commonly referred to as silent mutations. There are also special codons representing amino acids that are specifically for the start and end of translation. [10]

Organism-Specific ModificationsEdit

Although the information displayed in the above graphic is the most commonly employed genetic code amongst organisms, there are several species that have some of their own modifications to the code. One modification in particular is that of the vertebrate mitochondrial genetic code, in which there are four instances in which a codon is translated differently than in the nuclear genetic code of the organism. Deviance from the standard genetic code is also observed in invertebrate mitochondrial codons and in the nuclear codons of several bacteria and yeasts. [11]

Expanding the Genetic CodeEdit

Although the standard genetic code is well-known, there is still work being done to determine the subtle, organism-specific nuances of the code. For instance, recently two new amino acids, pyrrolysine and selenocysteine, have been discovered. These two residues are coded by codons that are already associated with other amino acids, but with specific mRNA signaling sequences, they may be added to the growing peptide chain during translation. [12]


  1. a b c
  6. a b
  8. Ooms M, Cupac D, Abbink T, Huthoff H, Berkhout B. "The availability of the primer activation signal (PAS) affects the efficiency of HIV-1 reverse transcription initiation." Nucleic Acids Res (2007) March; 35(5): 1649–1659. Accessed 4/2/2008
  9. Italiani V, Marques MV (2005). The Transcription Termination Factor Rho Is Essential and Autoregulated in Caulobacter crescentus". Bacteriology. 187. (12): 4290-4294. Accessed 4/2/2008
  10. a b
  12. S Osawa, T H Jukes, K Watanabe, and A Muto. "Recent evidence for evolution of the genetic code." Microbiol Rev. 1992 March; 56(1): 229–264. Accessed 4/8/2008.

Created April 2008 by Brent Strong and Phil Plummer

Last modified on 20 July 2009, at 09:31