Structural Biochemistry/Genetic code

Overview

The genetic code is the relationship between DNA base sequences and the amino acid sequence in proteins. The genetic code features include:

amino acids are encoded by three nucleotides
it is non-overlapping
it has no punctuation (alternatively, the stop codons could be viewed as a "period" or "full stop".)
it is described as degenerate or redundant.

There are 20 amino acids but 4 bases so a minimum of three bases are needed to code at least 20 amino acids. The set of three nitrogenous bases that code for an amino acid are known as a codon. There are 64 codons in total, 61 that encode amino acids and 3 that code for chain termination. The first letter of the code codes for a certain class of amino acids (for example aromatic rings, etc.) For this reason, a mistake in the first letter would cause the worst damage to an organism, as it would likely code for a completely different class of amino acid.

Nonoverlapping

The genetic code is non-overlapping; for example in a sequence of ABCDEF, ABC would code the first amino acid and DEF the second whereas in an overlapping code ABC could code for the first amino acid and BCD the second. The genetic code has no internal punctuation (like commas and semi-colons) such as having X in between each codon like XABCXDEFX... since it is read sequentially from a starting point (however it could be argued that the so called "stop" codons function as "periods" during translation). Therefore, a deletion or insertion mutation that does not occur in a multiple of three results in a frame shift mutation. The reading frame of the codons are shifted after the mutation and often result in a stop codon shortly afterward. Due to the extreme impact, frame shift mutations are often deleterious.

Degeneracy

The genetic code is degenerate in that most amino acids are encoded by more than one codon with the exception of tryptophan and methionine which only has one codon. Codons that specify the same amino acids are called synonyms and these codons usually differ in the last base of the triplet. Degeneracy is significant since it helps reduce the deleterious effects of mutations because point mutations, differing in only one amino acid, do not generally significantly alter the protein if at all. But of course, all genetic information does not solely depend on genetic codes, it is also contributed from regulatory sequences, intergenic segments and chromosomal structural areas, which are not as simple as this chart of genetic code. Another term that is synonymous with degeneracy of the genetic code is redundancy.

Translation process

Translation is the synthesis of a protein from an mRNA template. This process involves several key molecules including mRNA, the small and large subunits of the ribosome, tRNA, and the release factor. The process is broken into three stages: initiation, elongation, and termination. Eukaryotic mRNA, the substrate for translation, has a unique 3’ end called the Poly-A Tail. Messenger RNA (mRNA) also contains codons that will encode for specific amino acids; a methylated cap is found at the 5’ end. Translation initiation begins when the small subunit of the ribosome attaches to the cap and moves to the translation initiation site. Transfer RNA (tRNA) is another key molecule. It contains an anti-codon that is complementary to the mRNA codon to which it binds. The first mRNA codon is typically AUG. Attached to the end of the tRNA is the corresponding amino acid; methionine corresponds to the AUG codon. The large subunit of the ribosome now binds to create the peptidyl, or (P) site, and the aminoacyl, or (A) site. The first tRNA occupies the P-site, while the second tRNA enters the (A) site and is complementary to the second mRNA codon. The methionine is then transferred to the (A) site amino acid, the first tRNA exits, the ribosome moves along the mRNA, and the next tRNA enters. These are the basic steps of elongation. As elongation continues, the growing peptide is continually transferred to the (A) site tRNA, the ribosome moves along the mRNA, and new tRNAs enter. When a stop codon is encountered in the (A) site, a release factor enters the (A) site and translation is terminated. When termination is reached, the ribosome dissociates, and the newly formed protein is released.

Table of genetic codes

	U	C	A	G
U
	Phe	Ser	Tyr	Cys	U
	Phe	Ser	Tyr	Cys	C
	Leu	Ser	Stop	Stop	A
	Leu	Ser	Stop	Trp	G
C
	Leu	Pro	His	Arg	U
	Leu	Pro	His	Arg	C
	Leu	Pro	Gln	Arg	A
	Leu	Pro	Gln	Arg	G
A
	Ile	Thr	Asn	Ser	U
	Ile	Thr	Asn	Ser	C
	Ile	Thr	Lys	Arg	A
	Met	Thr	Lys	Arg	G
G
	Val	Ala	Asp	Gly	U
	Val	Ala	Asp	Gly	C
	Val	Ala	Glu	Gly	A
	Val	Ala	Glu	Gly	G

Universal Code?

Is the genetic code universal across all species? Since the genetic base sequence is known for many wild-type and mutant genes, the nucleotide and amino acid change in the genes can be correctly predicted by the genetic code. mRNA can be translated correctly by protein synthesizing methods of many different species. For example wheat-germ extract can correctly translate human hemoglobin mRNA and human bacteria can express recombinant DNA molecules encoding human proteins as insulin and such. Although these findings suggest that the genetic code is universal across species it was proven otherwise when the DNA for the human mitochondrial DNA became known. Human mitochondrial DNA differed in the translation of the genetic code that it read UGA as coding for tryptophan rather than as a stop signal. Also AGA and AGG are read as stop signals instead of arginine and AUA codes for methionine rather than isoleucine. It was also found that mitochondrial DNA of other species also have genetic codes that differ slightly. The mitochondrial DNA can differ from the rest of the cell's DNA because it encodes a distinct set of tRNAs. Also some cellular protein-synthesizing systems, at least 16, deviate from the standard genetic code such as the ciliated protozoa which reads UAA and UAG as codons for amino acids rather than stop signals. UGA is used as their only stop signal. Slight variations in genetic code exist in mitochondria and species that have branched off early in eukaryotic evolution. Most variations of the genetic code are for a simpler code and diminish information in the third base of the triple such as both AUA and AUG being codons for methionine. Therefore the genetic code is almost nearly universal but not quite.

The invariance of the genetic code through evolution is likely a result of selection against deleterious mutations that would arise if a mutation that altered the reading of mRNA changed the amino acid sequence of the proteins created by the organism.

References

1. Berg, Jeremy M. 2007. Biochemistry. Sixth Ed. New York: W.H. Freeman. 125-127