An Introduction to Molecular Biology/DNA the unit of life

An Introduction to Molecular Biology
DNA the unit of life

Genes are made from a long molecule called DNA, which is copied and inherited across generations. DNA is made of simple units that line up in a particular order within this large molecule. The order of these units carries genetic information, similar to how the order of letters on a page carry information. The language used by DNA is called the genetic code, which lets organisms read the information in the genes. This information is the instructions for constructing and operating a living organism.

Deoxyribonucleic acid(DNA): Deoxyribonucleic acid (/diˌɒksiˌraɪbɵ.njuːˌkleɪ.ɨk ˈæsɪd/ , or DNA, is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms (with the exception of RNA viruses). The main role of DNA molecules is the long-term storage of information. DNA is often compared to a set of blueprints, like a recipe or a code, since it contains the instructions needed to construct other components of cells, such as proteins and RNA molecules. The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in regulating the use of this genetic information. DNA consists of two long polymers of simple units called nucleotides, with backbones made of sugars and phosphate groups joined by ester bonds. These two strands run in opposite directions to each other and are therefore anti-parallel. Attached to each sugar is one of four types of molecules called bases. It is the sequence of these four bases along the backbone that encodes information. This information is read using the genetic code, which specifies the sequence of the amino acids within proteins. The code is read by copying stretches of DNA into the related nucleic acid RNA, in a process called transcription. The structure of DNA was first discovered by James D. Watson and Francis Crick. It is the same for all species, comprising two helical chains each coiled round the same axis, each with a pitch of 34 Ångströms (3.4 nanometres) and a radius of 10 Ångströms (1.0 nanometres).

Within cells, DNA is organized into long structures called chromosomes. These chromosomes are duplicated before cells divide, in a process called DNA replication. Eukaryotic organisms (animals, plants, fungi, and protists) store most of their DNA inside the cell nucleus and some of their DNA in organelles, such as mitochondria or chloroplasts.In contrast, prokaryotes (bacteria and archaea) store their DNA only in the cytoplasm. Within the chromosomes, chromatin proteins such as histones compact and organize DNA. These compact structures guide the interactions between DNA and other proteins, helping control which parts of the DNA are transcribed. The DNA double helix is stabilized by hydrogen bonds between the bases attached to the two strands. The four bases found in DNA are adenine (abbreviated A), cytosine (C), guanine (G) and thymine (T). These four bases are attached to the sugar/phosphate to form the complete nucleotide, as shown for adenosine monophosphate.^[1]

DNA is a genetic material

Griffith's experiment discovering the "transforming principle" in pneumococcus bacteria.

Griffith's experiment was conducted in 1928 by Frederick Griffith, one of the first experiments suggesting that bacteria are capable of transferring genetic information through a process known as transformation.
Griffith used two strains of Streptococcus pneumoniae bacteria which infect mice – a type III-S (smooth) and type II-R (rough) strain. The III-S strain covers itself with a polysaccharide capsule that protects it from the host's immune system, resulting in the death of the host, while the II-R strain doesn't have that protective capsule and is defeated by the host's immune system. A German bacteriologist, Fred Neufeld, had discovered the three pneumococcal types (Types I, II, and III) and discovered the Quellung reaction to identify them in vitro. Until Griffith's experiment, bacteriologists believed that the types were fixed and unchangeable, from one generation to another. In this experiment, bacteria from the III-S strain were killed by heat, and their remains were added to II-R strain bacteria. While neither alone harmed the mice, the combination was able to kill its host. Griffith was also able to isolate both live II-R and live III-S strains of pneumococcus from the blood of these dead mice. Griffith concluded that the type II-R had been "transformed" into the lethal III-S strain by a "transforming principle" that was somehow part of the dead III-S strain bacteria. Today, we know that the "transforming principle" Griffith observed was the DNA of the III-S strain bacteria. While the bacteria had been killed, the DNA had survived the heating process and was taken up by the II-R strain bacteria. The III-S strain DNA contains the genes that form the protective polysaccharide capsule. Equipped with this gene, the former II-R strain bacteria were now protected from the host's immune system and could kill the host. The exact nature of the transforming principle (DNA) was verified in the experiments done by Avery, McLeod and McCarty and by Hershey and Chase.^[2]

First confirmation:

Alfred Hershey and Martha Chase conducted series of experiments in 1952 by , confirming that DNA was the genetic material, which had first been demonstrated in the 1944 Avery–MacLeod–McCarty experiment. These experiments are known as Hershey Chase experiments. The existence of DNA was known to biologists since 1869, most of them assumed that proteins carried the information for inheritance that time. Hershey and Chase conducted their experiments on the T2 phage. The phage consists of a protein shell containing its genetic material. The phage infects a bacterium by attaching to its outer membrane and injecting its genetic material and leaving its empty shell attached to the bacterium.

In their first set of experiments, Hershey and Chase labeled the DNA of phages with radioactive Phosphorus-32 (p32) (the element phosphorus is present in DNA but not present in any of the 20 amino acids which are component of proteins). They allowed the phages to infect E. coli, and through several elegant experiments were able to observe the transfer of P32 labeled phage DNA into the cytoplasm of the bacterium. In their second set of experiments, they labeled the phages with radioactive Sulfur-35 (Sulfur is present in the amino acids cysteine and methionine, but not in DNA). Following infection of E. coli they then sheared the viral protein shells off of infected cells using a high-speed blender and separated the cells and viral coats by using a centrifuge. After separation, the radioactive S35 tracer was observed in the protein shells, but not in the infected bacteria, supporting the hypothesis that the genetic material which infects the bacteria was DNA and not protein.^[3]^[4] Hershey shared the 1969 Nobel Prize in Physiology or Medicine for his “discoveries concerning the genetic structure of viruses.”

Oswald T. Avery, Colin MacCleod, Maclyn McCarty with Francis Crick and James D Watson ^[5]

Structure of DNA

A GC base pair demonstrating three intermolecular hydrogen bonds.

An AT base pair demonstrating two intermolecular hydrogen bonds.

Two helical strands form the DNA backbone. Another double helix may be found by tracing the spaces, or grooves, between the strands. These voids are adjacent to the base pairs and may provide a binding site. As the strands are not directly opposite each other, the grooves are unequally sized. One groove, the major groove, is 22 Å wide and the other, the minor groove, is 12 Å wide. The narrowness of the minor groove means that the edges of the bases are more accessible in the major groove. As a result, proteins like transcription factors that can bind to specific sequences in double-stranded DNA usually make contacts to the sides of the bases exposed in the major groove. This situation varies in unusual conformations of DNA within the cell, but the major and minor grooves are always named to reflect the differences in size that would be seen if the DNA is twisted back into the ordinary B form.

Base pairing Of DNA

Structure of DNA.

Chargaff's rules was given by Erwin Chargaff which state that DNA from any cell of all organisms should have a 1:1 ratio of pyrimidine and purine bases and, more specifically, that the amount of guanine is equal to cytosine and the amount of adenine is equal to thymine. This pattern is found in both strands of the DNA. They were discovered by Austrian chemist Erwin Chargaff.

In molecular biology, two nucleotides on opposite complementary DNA strands that are connected via hydrogen bonds are called a base pair (often abbreviated bp). In the canonical Watson-Crick DNA base pairing, Adenine (A) forms a base pair with Thymine (T) and Guanine (G) forms a base pair with Cytosine (C). In RNA, thymine is replaced by Uracil (U). Alternate hydrogen bonding patterns, such as the wobble base pair and Hoogsteen base pair, also occur—particularly in RNA—giving rise to complex and functional tertiary structures.^[6]

Example

 5'CTCGTTTGCGCTCTATCG3'
 3'GAGCAAACGCGAGATAGC5'

Purine base

The German chemist Emil Fischer in 1884 gave the name 'purine' (purum uricum). He synthesized it for the first time in 1899 by uric acid which had been isolated from kidney stones by Scheele in 1776. Beside from DNA and RNA, purines are also components in a number of other important biomolecules, such as ATP, GTP, cyclic AMP, NADH, and coenzyme A. Purine itself, has not been found in nature, but it can be produced by organic synthesis.A purine is a heterocyclic aromatic organic compound, consisting of a pyrimidine ring fused to an imidazole ring.

Example:

Adenine

Adenine is one of the two purine nucleobases (the other being guanine) used in forming nucleotides of the nucleic acids (DNA or RNA). In DNA, adenine binds to thymine via two hydrogen bonds to assist in stabilizing the nucleic acid structures. Adenine forms adenosine, a nucleoside, when attached to ribose, and deoxyadenosine when attached to deoxyribose. It forms adenosine triphosphate (ATP), a nucleotide, when three phosphate groups are added to adenosine.

Guanine

Guanine, along with adenine and cytosine, is present in both DNA and RNA, whereas thymine is usually seen only in DNA, and uracil only in RNA. In DNA, guanine is paired with cytosine. With the formula C5H5N5O, guanine is a derivative of purine, consisting of a fused pyrimidine-imidazole ring system with conjugated double bonds.

Guanine has two tautomeric forms, the major keto form and rare enol form. It binds to cytosine through three hydrogen bonds. In cytosine, the amino group acts as the hydrogen donor and the C-2 carbonyl and the N-3 amine as the hydrogen-bond acceptors. Guanine has a group at C-6 that acts as the hydrogen acceptor, while the group at N-1 and the amino group at C-2 act as the hydrogen donors.

Pyrimidine base

Chemical structure of thymine

Cytosine with numbered components. Methylation occurs on carbon number 5.

Chemical structure of uracil

Pyrimidine is a heterocyclic aromatic organic compound similar to benzene and pyridine, containing two nitrogen atoms at positions 1 and 3 of the six-member ring. It is isomeric with two other forms of diazine.Three nucleobases found in nucleic acids, cytosine (C), thymine (T), and uracil (U), are pyrimidine derivatives.

A pyrimidine has many properties in common with pyridine, as the number of nitrogen atoms in the ring increases the ring pi electrons become less energetic and electrophilic aromatic substitution gets more difficult while nucleophilic aromatic substitution gets easier. An example of the last reaction type is the displacement of the amino group in 2-aminopyrimidine by chlorine and its reverse. Reduction in resonance stabilization of pyrimidines may lead to addition and ring cleavage reactions rather than substitutions. One such manifestation is observed in the Dimroth rearrangement. Compared to pyridine, N-alkylation and N-oxidation is more difficult, and pyrimidines are also less basic: The pKa value for protonated pyrimidine is 1.23 compared to 5.30 for pyridine.^[7] Pyrimidine also is found in meteorites, although scientists still do not know its origin. Pyrimidine also photolytically decomposes into Uracil under UV light.

Chemical structure of cytosine

Cytosine

Cytosine can be found as part of DNA, as part of RNA, or as a part of a nucleotide. As cytidine triphosphate (CTP), it can act as a co-factor to enzymes, and can transfer a phosphate to convert adenosine diphosphate (ADP) to adenosine triphosphate (ATP).The nucleoside of cytosine is cytidine. In DNA and RNA, cytosine is paired with guanine. However, it is inherently unstable, and can change into uracil (spontaneous deamination). This can lead to a point mutation if not repaired by the DNA repair enzymes such as uracil glycosylase, which cleaves a uracil in DNA.

Cytosine can also be methylated into 5-methylcytosine by an enzyme called DNA methyltransferase or be methylated and hydroxylated to make 5-hydroxymethylcytosine. Active enzymatic deamination of cytosine or 5-methylcytosine by the APOBEC family of cytosine deaminases could have both beneficial and detrimental implications on various cellular processes as well as on organismal evolution. The implications of deamination on 5-hydroxymethylcytosine, on the other hand, remains less understood.^[8]

Thymine

Thymine (T, Thy) is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine nucleobase. As the name suggests, thymine may be derived by methylation of uracil at the 5th carbon. In RNA, thymine is replaced with uracil in most cases. In DNA, thymine(T) binds to adenine (A) via two hydrogen bonds, thus stabilizing the nucleic acid structures.

Uracil

Uracil found in RNA, it base-pairs with adenine and replaces thymine during DNA transcription. Methylation of uracil produces thymine. It turns into thymine to protect the DNA and to improve the efficiency of DNA replication. Uracil can base-pair with any of the bases, depending on how the molecule arranges itself on the helix, but readily pairs with adenine because the methyl group is repelled into a fixed position. Uracil pairs with adenine through hydrogen bonding. Uracil is the hydrogen bond acceptor and can form two hydrogen bonds. Uracil can also bind with a ribose sugar to form the ribonucleoside uridine. When a phosphate attaches to uridine, uridine 5'-monophosphate is produced.

Nucleosides

Nitrogenous base	Nucleoside	Deoxynucleoside
Adenine	Adenosine A	Deoxyadenosine dA
Guanine	Guanosine G	Deoxyguanosine dG
Thymine	5-Methyluridine m⁵U	Thymidine dT
Uracil	Uridine U	Deoxyuridine dU
Cytosine	Cytidine C	Deoxycytidine dC

Nucleosides are glycosylamines consisting of a nucleobase (often referred to as simply base) bound to a ribose or deoxyribose sugar via a beta-glycosidic linkage. Examples of nucleosides include cytidine, uridine, adenosine, guanosine, thymidine and inosine. Nucleosides can be phosphorylated by specific kinases in the cell on the sugar's primary alcohol group (-CH2-OH), producing nucleotides, which are the molecular building-blocks of DNA and RNA.

Nucleosides can be produced by de novo synthesis pathways, in particular in the liver, but they are more abundantly supplied via ingestion and digestion of nucleic acids in the diet, whereby nucleotidases break down nucleotides (such as the thymine nucleotide) into nucleosides (such as thymidine) and phosphate.

1. Adenosine is a nucleoside composed of a molecule of adenine attached to a ribose sugar molecule (ribofuranose) moiety via a β-N9-glycosidic bond.

2. Cytidine is a nucleoside molecule that is formed when cytosine is attached to a ribose ring (also known as a ribofuranose) via a β-N1-glycosidic bond. Cytidine is a component of RNA.

3. Guanosine is a purine nucleoside comprising guanine attached to a ribose (ribofuranose) ring via a β-N9-glycosidic bond. Guanosine can be phosphorylated to become guanosine monophosphate (GMP), cyclic guanosine monophosphate (cGMP), guanosine diphosphate (GDP), and guanosine triphosphate (GTP).

4. Thymidine (more precisely called deoxythymidine; can also be labelled deoxyribosylthymine, and thymine deoxyriboside) is a chemical compound, more precisely a pyrimidine deoxynucleoside. Deoxythymidine is the DNA nucleoside T, which pairs with deoxyadenosine (A) in double-stranded DNA.

If cytosine is attached to a deoxyribose ring, it is known as a deoxycytidine^[9]

Nucleotide

A nucleotide is composed of a nucleobase (nitrogenous base), a five-carbon sugar (either ribose or 2'-deoxyribose), and one to three phosphate groups. Together, the nucleobase and sugar comprise a nucleoside. The phosphate groups form bonds with either the 2, 3, or 5-carbon of the sugar, with the 5-carbon site most common. Cyclic nucleotides form when the phosphate group is bound to two of the sugar's hydroxyl groups. Ribonucleotides are nucleotides where the sugar is ribose, and deoxyribonucleotides contain the sugar deoxyribose. Nucleotides can contain either a purine or a pyrimidine base. Nucleic acids are polymeric macromolecules made from nucleotide monomers. In DNA, the purine bases are adenine and guanine, while the pyrimidines are thymine and cytosine. RNA uses uracil in place of thymine. Adenine always pairs with thymine by 2 hydrogen bonds, while guanine pairs with cytosine through 3 hydrogen bonds, each due to their unique structures.

A deoxyribonucleotide is the monomer, or single unit, of DNA, or deoxyribonucleic acid. Each deoxyribonucleotide comprises three parts: a nitrogenous base, a deoxyribose sugar, and one or more phosphate groups. The nitrogenous base is always bonded to the 1' carbon of the deoxyribose, which is distinguished from ribose by the presence of a proton on the 2' carbon rather than an -OH group. The phosphate groups bind to the 5' carbon of the sugar. When deoxyribonucleotides polymerize to form DNA, the phosphate group from one nucleotide will bond to the 3' carbon on another nucleotide, forming a phosphodiester bond via dehydration synthesis. New nucleotides are always added to the 3' carbon of the last nucleotide, so synthesis always proceeds from 5' to 3'.^[10]

Phosphodiester bond

A phosphodiester bond is a group of strong covalent bonds between a phosphate group and two 5-carbon ring carbohydrates (pentoses) over two ester bonds. Phosphodiester bonds are central to most life on Earth, as they make up the backbone of the strands of DNA. In DNA and RNA, the phosphodiester bond is the linkage between the 3' carbon atom of one sugar molecule and the 5' carbon of another, deoxyribose in DNA and ribose in RNA. The phosphate groups in the phosphodiester bond are negatively-charged. Because the phosphate groups have a pKa near 0, they are negatively-charged at pH 7. This repulsion forces the phosphates to take opposite sides of the DNA strands and is neutralized by proteins (histones), metal ions such as magnesium, and polyamines. In order for the phosphodiester bond to be formed and the nucleotides to be joined, the tri-phosphate or di-phosphate forms of the nucleotide building blocks are broken apart to give off energy required to drive the enzyme-catalyzed reaction. When a single phosphate or two phosphates known as pyrophosphates break away and catalyze the reaction, the phosphodiester bond is formed. Hydrolysis of phosphodiester bonds can be catalyzed by the action of phosphodiesterases which play an important role in repairing DNA sequences. In biological systems, the phosphodiester bond between two ribonucleotides can be broken by alkaline hydrolysis because of the free 2' hydroxyl group.^[11]

Diagram of phosphodiester bonds (PO₄^3-) between nucleotides. Which presents Thymine (U) and two molecules of Adenine (A).

Adenosine monophosphate AMP	Adenosine diphosphate ADP	Adenosine triphosphate ATP
Guanosine monophosphate GMP	Guanosine diphosphate GDP	Guanosine triphosphate GTP
Ribothymidine monophosphate rTMP	Ribothymidine diphosphate rTDP	Ribothymidine triphosphate rTTP
Uridine monophosphate UMP	Uridine diphosphate UDP	Uridine triphosphate UTP
Cytidine monophosphate CMP	Cytidine diphosphate CDP	Cytidine triphosphate CTP

Forms of DNA

A-DNA: A-DNA is one of the many possible double helical structures of DNA. A-DNA is thought to be one of three biologically active double helical structures along with B- and Z-DNA. It is a right-handed double helix fairly similar to the more common and well-known B-DNA form, but with a shorter more compact helical structure. It appears likely that it occurs only in dehydrated samples of DNA, such as those used in crystallographic experiments, and possibly is also assumed by DNA-RNA hybrid helices and by regions of double-stranded RNA.^[12]

B-DNAThe most common form of DNA is B DNA. The DNA double helix is a spiral polymer of nucleic acids, held together by nucleotides which base pair together. In B-DNA, the most common double helical structure, the double helix is right-handed with about 10–10.5 nucleotides per turn. The double helix structure of DNA contains a major groove and minor groove, the major groove being wider than the minor groove. Given the difference in widths of the major groove and minor groove, many proteins which bind to DNA do so through the wider major groove.

Z-DNA: Z-DNA is one of the many possible double helical structures of DNA. It is a left-handed double helical structure in which the double helix winds to the left in a zig-zag pattern (instead of to the right, like the more common B-DNA form). Z-DNA is thought to be one of three biologically active double helical structures along with A- and B-DNA. Z-DNA is quite different from the right-handed forms. In fact, Z-DNA is often compared against B-DNA in order to illustrate the major differences. The Z-DNA helix is left-handed and has a structure that repeats every 2 base pairs. The major and minor grooves, unlike A- and B-DNA, show little difference in width. Formation of this structure is generally unfavourable, although certain conditions can promote it; such as alternating purine-pyrimidine sequence (especially poly(dGC)2), negative DNA supercoiling or high salt and some cations (all at physiological temperature, 37 °C, and pH 7.3-7.4). Z-DNA can form a junction with B-DNA (called a "B-to-Z junction box") in a structure which involves the extrusion of a base pair. The Z-DNA conformation has been difficult to study because it does not exist as a stable feature of the double helix. Instead, it is a transient structure that is occasionally induced by biological activity and then quickly disappears.^[13]

From left to right, the structures of A, B and Z DNA

Difference between three major forms of DNA
	A-DNA	B-DNA	Z-DNA
Helix sense	Right-handed	Right-handed	Left-handed
Diameter	23 Å (2.3 nm)	20 Å (2.0 nm)	18 Å (1.8 nm)
Repeating unit	1 bp	1 bp	2 bp
Rotation/bp	32.7°	35.9°	60°/2
bp/turn	11	10.5	12
Inclination of bp to axis	+19°	−1.2°	−9°
Rise/bp along axis	2.3 Å (0.23 nm)	3.32 Å (0.332 nm)	3.8 Å (0.38 nm)
Pitch/turn of helix	28.2 Å (2.82 nm)	33.2 Å (3.32 nm)	45.6 Å (4.56 nm)
Mean propeller twist	+18°	+16°	0°
Glycosyl angle	anti	anti	C: anti, G: syn
Sugar pucker	C3'-endo	C2'-endo	C: C2'-endo, G: C2'-exo

bp-Base pair, nm-nano meter

Noncoding genomic DNA

In molecular biology, noncoding DNA describes components of an organism's DNA sequences that do not encode for protein sequences.

Pseudogenes Pseudogenes are DNA sequences, related to known genes, that have lost their protein-coding ability or are otherwise no longer expressed in the cell. Pseudogenes arise from retrotransposition or genomic duplication of functional genes, and become "genomic fossils" that are nonfuctional due to mutations that prevent the transcription of the gene, such as within the gene promoter region, or fatally alter the translation of the gene, such as premature stop codons or frameshifts. Pseudogenes resulting from the retrotransposition of an RNA intermediate are known as processed pseudogenes; pseudogenes that arise from the genomic remains of duplicated genes or residues of inactivated genes are nonprocessed pseudogenes. While Dollo's Law suggests that the loss of function in pseudogenes is likely permanent, silenced genes may actually retain function for several million years and can be "reactivated" into protein-coding sequences and a substantial number of pseudogenes are actively transcribed. Because pseudogenes are presumed to evolve without evolutionary constraint, they can serve as a useful model of the type and frequencies of various spontaneous genetic mutations.^[14]

Coiling of DNA

Supercoiled structure of circular DNA molecules with low writhe. Note that the helical nature of the DNA duplex is omitted for clarity.

Supercoiled structure of linear DNA molecules with constrained ends. Note that the helical nature of the DNA duplex is omitted for clarity.

DNA supercoiling is important for DNA packaging within all cells. Because the length of DNA can be thousands of times that of a cell, packaging this genetic material into the cell or nucleus (in eukaryotes) is a difficult feat. Supercoiling of DNA reduces the space and allows for a lot more DNA to be packaged. In prokaryotes, plectonemic supercoils are predominant, because of the circular chromosome and relatively small amount of genetic material. In eukaryotes, DNA supercoiling exists on many levels of both plectonemic and solenoidal supercoils, with the solenoidal supercoiling proving most effective in compacting the DNA. Solenoidal supercoiling is achieved with histones to form a 10 nm fiber. This fiber is further coiled into a 30 nm fiber, and further coiled upon itself numerous times more. DNA packaging is greatly increased during nuclear division events such as mitosis or meiosis, where DNA must be compacted and segregated to daughter cells. Condensins and cohesins are Structural Maintenance of Chromosome proteins that aid in the condensation of sister chromatids and the linkage of the centromere in sister chromatids. These SMC proteins induce positive supercoils. Supercoiling is also required for DNA/RNA synthesis. Because DNA must be unwound for DNA/RNA polymerase action, supercoils will result. The region ahead of the polymerase complex will be unwound; this stress is compensated with positive supercoils ahead of the complex. Behind the complex, DNA is rewound and there will be compensatory negative supercoils. It is important to note that topoisomerases such as DNA gyrase (Type II Topoisomerase) play a role in relieving some of the stress during DNA/RNA synthesis.^[15]

NA supercoiling can be described numerically by changes in the 'linking number' Lk. The linking number is the most descriptive property of supercoiled DNA. Lk_o, the number of turns in the relaxed (B type) DNA plasmid/molecule, is determined by dividing the total base pairs of the molecule by the relaxed bp/turn which, depending on reference is 10.4-10.5.

Lk_{o}=bp/10.4

Lk is merely the number of crosses a single strand makes across the other in a planar projection. The topology of the DNA is described by the equation below in which the linking number is equivalent to the sum of TW, which is the number of twists or turns of the double helix, and Wr which is the number of coils or 'writhes'. If there is a closed DNA molecule, the sum of TW and Wr, or the linking number, does not change. However, there may be complementary changes in TW and Wr without changing their sum.

Lk=Tw+Wr

The change in the linking number, ΔLk, is the actual number of turns in the plasmid/molecule, Lk, minus the number of turns in the relaxed plasmid/molecule Lk_o.

\Delta {Lk=Lk-Lk_{o}}

If the DNA is negatively supercoiled ΔLk < 0. The negative supercoiling implies that the DNA is underwound.

A standard expression independent of the molecule size is the "specific linking difference" or "superhelical density" denoted σ. σ represents the number of turns added or removed relative to the total number of turns in the relaxed molecule/plasmid, indicating the level of supercoiling.

\sigma =\Delta {Lk/Lk_{o}}

The Gibbs free energy associated with the coiling is given by the equation below^[16]

{\Delta G/N=10RT\sigma ^{2}}

The linking number is a numerical invariant that describes the linking of two closed curves in three-dimensional space. Intuitively, the linking number represents the number of times that each curve winds around the other. The linking number is always an integer, but may be positive or negative depending on the orientation of the two curves. Since the linking number L of supercoiled DNA is the number of times the two strands are intertwined (and both strands remain covalently intact), L cannot change. The reference state (or parameter) L₀ of a circular DNA duplex is its relaxed state. In this state, its writhe W = 0. Since L = T + W, in a relaxed state T = L. Thus, if we have a 400 bp relaxed circular DNA duplex, L ~ 40 (assuming ~10 bp per turn in B-DNA). Then T ~ 40.

Positively supercoiling:
T = 0, W = 0, then L = 0

T = +3, W = 0, then L = +3

T = +2, W = +1, then L = +3
Negatively supercoiling:
T = 0, W = 0, then L = 0

T = -3, W = 0, then L = -3

T = -2, W = -1, then L = -3

Negative supercoils favor local unwinding of the DNA, allowing processes such as transcription, DNA replication, and recombination. Negative supercoiling is also thought to favour the transition between B-DNA and Z-DNA, and moderate the interactions of DNA binding proteins involved in gene regulation.^[17]

Histones: The DNA binding protein

Histones were discovered in 1884 by Albrecht Kossel. The word "histone" dates from the late 19th century and is from the German "Histon", of uncertain origin: perhaps from Greek histanai or from histos. Until the early 1990s, histones were dismissed by most as inert packing material for eukaryotic nuclear DNA, based in part on the "ball and stick" models of Mark Ptashne and others who believed transcription was activated by protein-DNA and protein-protein interactions on largely naked DNA templates, as is the case in bacteria. During the 1980s, work by Michael Grunstein ^[18] demonstrated that eukaryotic histones repress gene transcription, and that the function of transcriptional activators is to overcome this repression. We now know that histones play both positive and negative roles in gene expression, forming the basis of the histone code.

The discovery of the H5 histone appears to date back to 1970's,^[19]^[20] and in classification it has been grouped with The nucleosome core is formed of two H2A-H2B dimers and a H3-H4 tetramer, forming two nearly symmetrical halves by tertiary structure (C2 symmetry; one macromolecule is the mirror image of the other).The H2A-H2B dimers and H3-H4 tetramer also show pseudodyad symmetry. The 4 'core' histones (H2A, H2B, H3 and H4) are relatively similar in structure and are highly conserved through evolution, all featuring a 'helix turn helix turn helix' motif (which allows the easy dimerisation). They also share the feature of long 'tails' on one end of the amino acid structure – this being the location of post-translational modification (see below).

The crystal structure of the nucleosome core particle consisting of H2A , H2B , H3 and H4 and DNA. The view is from the top through the superhelical axis.

It has been proposed that histone proteins are evolutionarily related to the helical part of the extended AAA+ ATPase domain, the C-domain, and to the N-terminal substrate recognition domain of Clp/Hsp100 proteins. Despite the differences in their topology, these three folds share a homologous helix-strand-helix (HSH) motif.

Using an electron paramagnetic resonance spin-labeling technique, British researchers measured the distances between the spools around which eukaryotic cells wind their DNA. They determined the spacings range from 59 to 70 Å. In all, histones make five types of interactions with DNA:

Helix-dipoles from alpha-helices in H2B, H3, and H4 cause a net positive charge to accumulate at the point of interaction with negatively charged phosphate groups on DNA

Hydrogen bonds between the DNA backbone and the amide group on the main chain of histone proteins

Nonpolar interactions between the histone and deoxyribose sugars on DNA

Salt bridges and hydrogen bonds between side chains of basic amino acids (especially lysine and arginine) and phosphate oxygens on DNA

Non-specific minor groove insertions of the H3 and H2B N-terminal tails into two minor grooves each on the DNA molecule

The highly basic nature of histones, aside from facilitating DNA-histone interactions, contributes to the water solubility of histones. Histones are subject to post translational modification by enzymes primarily on their N-terminal tails, but also in their globular domains. Such modifications include methylation, citrullination, acetylation, phosphorylation, SUMOylation, ubiquitination, and ADP-ribosylation. This affects their function of gene regulation. In general, genes that are active have less bound histone, while inactive genes are highly associated with histones during interphase. It also appears that the structure of histones has been evolutionarily conserved, as any deleterious mutations would be severely maladaptive.

Histone DNA interaction

The core histone proteins contain a characteristic structural motif termed the "histone fold" which consists of three alpha-helices (α1-3) separated by two loops (L1-2). In solution the histones form H2A-H2B heterodimers and H3-H4 heterotetramers. Histones dimerise about their long α2 helices in an anti-parallel orientation, and in the case of H3 and H4, two such dimers form a 4-helix bundle stabilised by extensive H3-H3’ interaction. The H2A/H2B dimer binds onto the H3/H4 tetramer due to interactions between H4 and H2B which include the formation of a hydrophobic cluster. The histone octamer is formed by a central H3/H4 tetramer sandwiched between two H2A/H2B dimers. Due to the highly basic charge of all four core histones, the histone octamer is only stable in the presence of DNA or very high salt concentrations.

Nucleosomes form the fundamental repeating units of eukaryotic chromatin, which is used to pack the large eukaryotic genomes into the nucleus while still ensuring appropriate access to it (in mammalian cells approximately 2 m of linear DNA have to be packed into a nucleus of roughly 10 µm diameter). Nucleosomes are folded through a series of successively higher order structures to eventually form a chromosome; this both compacts DNA and creates an added layer of regulatory control which ensures correct gene expression. Nucleosomes are thought to carry epigenetically inherited information in the form of covalent modifications of their core histones. The nucleosome hypothesis was proposed by Don and Ada Olins in 1974 and Roger Kornberg.

The nucleosome core particle ) consists of about 146 bp of DNA wrapped in 1.67 left-handed superhelical turns around the histone octamer, consisting of 2 copies each of the core histones H2A, H2B, H3, and H4. Adjacent nucleosomes are joined by a stretch of free DNA termed "linker DNA" (which varies from 10 - 80 bp in length depending on species and tissue type.

DNA-binding domains

The λ repressor of bacteriophage lambda employs a helix-turn-helix (left; green) to bind DNA (right; blue and red).

One or more DNA-binding domains are often part of a larger protein consisting of additional domains with differing function. The additional domains often regulate the activity of the DNA-binding domain. The function of DNA binding is either structural or involving transcription regulation, with the two roles sometimes overlapping. DNA-binding domains with functions involving DNA structure have biological roles in the replication, repair, storage, and modification of DNA, such as methylation. Many proteins involved in the regulation of gene expression contain DNA-binding domains. For example, proteins that regulate transcription by binding DNA are called transcription factors. The final output of most cellular signaling cascades is gene regulation. The DBD interacts with the nucleotides of DNA in a DNA sequence-specific or non-sequence-specific manner, but even non-sequence-specific recognition involves some sort of molecular complementarity between protein and DNA. DNA recognition by the DBD can occur at the major or minor groove of DNA, or at the sugar-phosphate DNA backbone (see the structure of DNA). Each specific type of DNA recognition is tailored to the protein's function. For example, the DNA-cutting enzyme DNAse I cuts DNA almost randomly and so must bind to DNA in a non-sequence-specific manner. But, even so, DNAse I recognizes a certain 3-D DNA structure, yielding a somewhat specific DNA cleavage pattern that can be useful for studying DNA recognition by a technique called DNA footprinting. Many DNA-binding domains must recognize specific DNA sequences, such as DBDs of transcription factors that activate specific genes, or those of enzymes that modify DNA at specific sites, like restriction enzymes and telomerase. The hydrogen bonding pattern in the DNA major groove is less degenerate than that of the DNA minor groove, providing a more attractive site for sequence-specific DNA recognition. The specificity of DNA-binding proteins can be studied using many biochemical and biophysical techniques, such as gel electrophoresis, analytical ultracentrifugation, calorimetry, DNA mutation, protein structure mutation or modification, nuclear magnetic resonance, x-ray crystallography, surface plasmon resonance, electron paramagnetic resonance, cross-linking and Microscale Thermophoresis (MST).^[21]

Types of DNA-binding domains

Helix-turn-helix

Originally discovered in bacteria, the helix-turn-helix motif is commonly found in repressor proteins and is about 20 amino acids long. In eukaryotes, the homeodomain comprises 2 helices, one of which recognizes the DNA (aka recognition helix). They are common in proteins that regulate developmental processes (PROSITE HTH).^[22]

Zinc finger

Leucine Zipper (blue) bound to DNA. The leucine residues that represent the 'teeth' of the zipper are colored red

Crystallographic structure (PDB 1R4O) of a dimer of the zinc finger containing DBD of the glucocorticoid receptor (top) bound to DNA (bottom). Zinc atoms are represented by grey spheres and the coordinating cysteine sidechains are depicted as sticks. The zinc finger This domain is generally between 23 and 28 amino acids long and is stabilized by coordinating Zinc ions with regularly spaced zinc-coordinating residues (either histidines or cysteines). The most common class of zinc finger (Cys2His2) coordinates a single zinc ion and consists of a recognition helix and a 2-strand beta-sheet. In transcription factors these domains are often found in arrays (usually separated by short linker sequences) and adjacent fingers are spaced at 3 basepair intervals when bound to DNA.

Crick and Watson DNA model built in 1953, was largely from its original pieces in 1973 and donated to the National Science Museum in London.

Fold Group	Representative structure	Ligand placement
Cys₂His₂		Two ligands from a knuckle and two more from the c terminus of a helix.
Gag knuckle		Two ligands from a knuckle and two more from a short helix or loop.
Treble clef		Two ligands from a knuckle and two more from the N terminus of a helix.
Zinc ribbon		Two ligands each from two knuckles.
Zn₂/Cys₆		Two ligands from the N terminus of a helix and two more from a loop.
TAZ2 domain like		Two ligands from the termini of two helices.

Leucine zipper

The basic leucine zipper (bZIP) domain contains an alpha helix with a leucine at every 7th amino acid. If two such helices find one another, the leucines can interact as the teeth in a zipper, allowing dimerization of two proteins. When binding to the DNA, basic amino acid residues bind to the sugar-phosphate backbone while the helices sit in the major grooves. It regulates gene expression.The bZip family of transcription factors consist of a basic region that interacts with the major groove of a DNA molecule through hydrogen bonding, and a hydrophobic leucine zipper region that is responsible for dimerization.

Winged helix

Consisting of about 110 amino acids, the winged helix (WH) domain has four helices and a two-strand beta-sheet.

Winged helix turn helix The winged helix turn helix domain (wHTH) SCOP 46785 is typically 85-90 amino acids long. It is formed by a 3-helical bundle and a 4-strand beta-sheet (wing).

Helix-loop-helix

The Helix-loop-helix domain is found in some transcription factors and is characterized by two α helices connected by a loop. One helix is typically smaller and due to the flexibility of the loop, allows dimerization by folding and packing against another helix. The larger helix typically contains the DNA-binding regions.

HMG-box

HMG-box domains are found in high mobility group proteins which are involved in a variety of DNA-dependent processes like replication and transcription. The domain consists of three alpha helices separated by loops.

DNA sequencing

RNA sequencing was one of the earliest forms of nucleotide sequencing. The major landmark of RNA sequencing is the sequence of the first complete gene and the complete genome of Bacteriophage MS2, identified and published by Walter Fiers and his coworkers at the University of Ghent (Ghent, Belgium), between 1972 and 1976. Prior to the development of rapid DNA sequencing methods in the early 1970s by Frederick Sanger at the University of Cambridge, in England and Walter Gilbert and Allan Maxam at Harvard, a number of laborious methods were used. For instance, in 1973, Gilbert and Maxam reported the sequence of 24 basepairs using a method known as wandering-spot analysis. The chain-termination method developed by Sanger and coworkers in 1975 soon became the method of choice, owing to its relative ease and reliability.^[23]

Maxam and Gilbert method

In 1976–1977, Allan Maxam and Walter Gilbert developed a DNA sequencing method based on chemical modification of DNA and subsequent cleavage at specific bases. Although Maxam and Gilbert published their chemical sequencing method two years after the ground-breaking paper of Sanger and Coulson on plus-minus sequencing,Maxam–Gilbert sequencing rapidly became more popular, since purified DNA could be used directly, while the initial Sanger method required that each read start be cloned for production of single-stranded DNA. However, with the improvement of the chain-termination method (see below), Maxam-Gilbert sequencing has fallen out of favour due to its technical complexity prohibiting its use in standard molecular biology kits, extensive use of hazardous chemicals, and difficulties with scale-up. The method requires radioactive labeling at one 5' end of the DNA (typically by a kinase reaction using gamma-32P ATP) and purification of the DNA fragment to be sequenced. Chemical treatment generates breaks at a small proportion of one or two of the four nucleotide bases in each of four reactions (G, A+G, C, C+T). For example, the purines (A+G) are depurinated using formic acid, the guanines (and to some extent the adenines) are methylated by dimethyl sulfate, and the pyrimidines (C+T) are methylated using hydrazine. The addition of salt (sodium chloride) to the hydrazine reaction inhibits the methylation of thymine for the C-only reaction. The modified DNAs are then cleaved by hot piperidine at the position of the modified base. The concentration of the modifying chemicals is controlled to introduce on average one modification per DNA molecule. Thus a series of labeled fragments is generated, from the radiolabeled end to the first "cut" site in each molecule. The fragments in the four reactions are electrophoresed side by side in denaturing acrylamide gels for size separation. To visualize the fragments, the gel is exposed to X-ray film for autoradiography, yielding a series of dark bands each corresponding to a radiolabeled DNA fragment, from which the sequence may be inferred. Also sometimes known as "chemical sequencing", this method led to the Methylation Interference Assay used to map DNA-binding sites for DNA-binding proteins.^[24]

Dideoxynucleotide Chain-termination methods

Part of a radioactively labelled sequencing gel

Because the chain-terminator method (or Sanger method after its developer Frederick Sanger) is more efficient and uses fewer toxic chemicals and lower amounts of radioactivity than the method of Maxam and Gilbert, it rapidly became the method of choice. The key principle of the Sanger method was the use of dideoxynucleotide triphosphates (ddNTPs) as DNA chain terminators.

The classical chain-termination method requires a single-stranded DNA template, a DNA primer, a DNA polymerase, normal deoxynucleotidephosphates (dNTPs), and modified nucleotides (dideoxyNTPs) that terminate DNA strand elongation. These ddNTPs will also be radioactively or fluorescently labelled for detection in automated sequencing machines. The DNA sample is divided into four separate sequencing reactions, containing all four of the standard deoxynucleotides (dATP, dGTP, dCTP and dTTP) and the DNA polymerase. To each reaction is added only one of the four dideoxynucleotides (ddATP, ddGTP, ddCTP, or ddTTP) which are the chain-terminating nucleotides, lacking a 3'-hydroxyl (OH) group required for the formation of a phosphodiester bond between two nucleotides, thus terminating DNA strand extension and resulting in DNA fragments of varying length.

The newly synthesized and labelled DNA fragments are heat denatured, and separated by size (with a resolution of just one nucleotide) by gel electrophoresis on a denaturing polyacrylamide-urea gel with each of the four reactions run in one of four individual lanes (lanes A, T, G, C); the DNA bands are then visualized by autoradiography or UV light, and the DNA sequence can be directly read off the X-ray film or gel image. In the image on the right, X-ray film was exposed to the gel, and the dark bands correspond to DNA fragments of different lengths. A dark band in a lane indicates a DNA fragment that is the result of chain termination after incorporation of a dideoxynucleotide (ddATP, ddGTP, ddCTP, or ddTTP). The relative positions of the different bands among the four lanes are then used to read (from bottom to top) the DNA sequence.^[25]

DNA fragments are labelled with a radioactive or fluorescent tag on the primer (1), in the new DNA strand with a labeled dNTP, or with a labeled ddNTP. (click to expand)

Technical variations of chain-termination sequencing include tagging with nucleotides containing radioactive phosphorus for radiolabelling, or using a primer labeled at the 5’ end with a fluorescent dye. Dye-primer sequencing facilitates reading in an optical system for faster and more economical analysis and automation. The later development by Leroy Hood and coworkers ^[26]^[27] of fluorescently labeled ddNTPs and primers set the stage for automated, high-throughput DNA sequencing.

Sequence ladder by radioactive sequencing compared to fluorescent peaks

Chain-termination methods have greatly simplified DNA sequencing. For example, chain-termination-based kits are commercially available that contain the reagents needed for sequencing, pre-aliquoted and ready to use. Limitations include non-specific binding of the primer to the DNA, affecting accurate read-out of the DNA sequence, and DNA secondary structures affecting the fidelity of the sequence.

Dye-terminator sequencing

Capillary electrophoresis (click to expand)

Dye-terminator sequencing utilizes labelling of the chain terminator ddNTPs, which permits sequencing in a single reaction, rather than four reactions as in the labelled-primer method. In dye-terminator sequencing, each of the four dideoxynucleotide chain terminators is labelled with fluorescent dyes, each of which emit light at different wavelengths.

Owing to its greater expediency and speed, dye-terminator sequencing is now the mainstay in automated sequencing. Its limitations include dye effects due to differences in the incorporation of the dye-labelled chain terminators into the DNA fragment, resulting in unequal peak heights and shapes in the electronic DNA sequence trace chromatogram after capillary electrophoresis (see figure to the left).

This problem has been addressed with the use of modified DNA polymerase enzyme systems and dyes that minimize incorporation variability, as well as methods for eliminating "dye blobs". The dye-terminator sequencing method, along with automated high-throughput DNA sequence analyzers, is now being used for the vast majority of sequencing projects.

Challenges

Common challenges of DNA sequencing include poor quality in the first 15–40 bases of the sequence and deteriorating quality of sequencing traces after 700–900 bases. Base calling software typically gives an estimate of quality to aid in quality trimming.^[28]^[29]

In cases where DNA fragments are cloned before sequencing, the resulting sequence may contain parts of the cloning vector. In contrast, PCR-based cloning and emerging sequencing technologies based on pyrosequencing often avoid using cloning vectors. Recently, one-step Sanger sequencing (combined amplification and sequencing) methods such as Ampliseq and SeqSharp have been developed that allow rapid sequencing of target genes without cloning or prior amplification.^[30]^[31]

Current methods can directly sequence only relatively short (300–1000 nucleotides long) DNA fragments in a single reaction. The main obstacle to sequencing DNA fragments above this size limit is insufficient power of separation for resolving large DNA fragments that differ in length by only one nucleotide. In all cases the use of a primer with a free 5' end is essential.

Automation and sample preparation

View of the start of an example dye-terminator read

Automated DNA-sequencing instruments (DNA sequencers) can sequence up to 384 DNA samples in a single batch (run) in up to 24 runs a day. DNA sequencers carry out capillary electrophoresis for size separation, detection and recording of dye fluorescence, and data output as fluorescent peak trace chromatograms. Sequencing reactions by thermocycling, cleanup and re-suspension in a buffer solution before loading onto the sequencer are performed separately. A number of commercial and non-commercial software packages can trim low-quality DNA traces automatically. These programs score the quality of each peak and remove low-quality base peaks (generally located at the ends of the sequence). The accuracy of such algorithms is below visual examination by a human operator, but sufficient for automated processing of large sequence data sets.

Polymerase chain reaction

Figure 1: Schematic drawing of the PCR cycle. (1) Denaturing at 94–96 °C. (2) Annealing at ~65 °C (3) Elongation at 72 °C. Four cycles are shown here. The blue lines represent the DNA template to which primers (red arrows) anneal that are extended by the DNA polymerase (light green circles), to give shorter DNA products (green lines), which themselves are used as templates as PCR progresses.

PCR

PCR is used to amplify a specific region of a DNA strand (the DNA target). Most PCR methods typically amplify DNA fragments of up to ~10 kilo base pairs (kb), although some techniques allow for amplification of fragments up to 40 kb in size. A basic PCR set up requires several components and reagents.These components include:

DNA template that contains the DNA region (target) to be amplified.

Two primers that are complementary to the 3' (three prime) ends of each of the sense and anti-sense strand of the DNA target. Taq polymerase or another DNA polymerase with a temperature optimum at around 70 °C. Deoxynucleotide triphosphates (dNTPs), the building-blocks from which the DNA polymerase synthesizes a new DNA strand. Buffer solution, providing a suitable chemical environment for optimum activity and stability of the DNA polymerase. Divalent cations, magnesium or manganese ions; generally Mg2+ is used, but Mn2+ can be utilized for PCR-mediated DNA mutagenesis, as higher Mn2+ concentration increases the error rate during DNA synthesis Monovalent cation potassium ions. The PCR is commonly carried out in a reaction volume of 10–200 μl in small reaction tubes (0.2–0.5 ml volumes) in a thermal cycler. The thermal cycler heats and cools the reaction tubes to achieve the temperatures required at each step of the reaction (see below). Many modern thermal cyclers make use of the Peltier effect, which permits both heating and cooling of the block holding the PCR tubes simply by reversing the electric current. Thin-walled reaction tubes permit favorable thermal conductivity to allow for rapid thermal equilibration. Most thermal cyclers have heated lids to prevent condensation at the top of the reaction tube. Older thermocyclers lacking a heated lid require a layer of oil on top of the reaction mixture or a ball of wax inside the tube.^[32]

Procedure

Figure 1: Schematic drawing of the PCR cycle. (1) Denaturing at 94–96 °C. (2) Annealing at ~65 °C (3) Elongation at 72 °C. Four cycles are shown here. The blue lines represent the DNA template to which primers (red arrows) anneal that are extended by the DNA polymerase (light green circles), to give shorter DNA products (green lines), which themselves are used as templates as PCR progresses. Typically, PCR consists of a series of 20-40 repeated temperature changes, called cycles, with each cycle commonly consisting of 2-3 discrete temperature steps, usually three . The cycling is often preceded by a single temperature step (called hold) at a high temperature (>90 °C), and followed by one hold at the end for final product extension or brief storage. The temperatures used and the length of time they are applied in each cycle depend on a variety of parameters. These include the enzyme used for DNA synthesis, the concentration of divalent ions and dNTPs in the reaction, and the melting temperature (Tm) of the primers.Initialization step: This step consists of heating the reaction to a temperature of 94–96 °C (or 98 °C if extremely thermostable polymerases are used), which is held for 1–9 minutes. It is only required for DNA polymerases that require heat activation by hot-start PCR. Denaturation step: This step is the first regular cycling event and consists of heating the reaction to 94–98 °C for 20–30 seconds. It causes DNA melting of the DNA template by disrupting the hydrogen bonds between complementary bases, yielding single-stranded DNA molecules. Annealing step: The reaction temperature is lowered to 50–65 °C for 20–40 seconds allowing annealing of the primers to the single-stranded DNA template. Typically the annealing temperature is about 3-5 degrees Celsius below the Tm of the primers used. Stable DNA-DNA hydrogen bonds are only formed when the primer sequence very closely matches the template sequence. The polymerase binds to the primer-template hybrid and begins DNA synthesis. Extension/elongation step: The temperature at this step depends on the DNA polymerase used; Taq polymerase has its optimum activity temperature at 75–80 °C, and commonly a temperature of 72 °C is used with this enzyme. At this step the DNA polymerase synthesizes a new DNA strand complementary to the DNA template strand by adding dNTPs that are complementary to the template in 5' to 3' direction, condensing the 5'-phosphate group of the dNTPs with the 3'-hydroxyl group at the end of the nascent (extending) DNA strand. The extension time depends both on the DNA polymerase used and on the length of the DNA fragment to be amplified. As a rule-of-thumb, at its optimum temperature, the DNA polymerase will polymerize a thousand bases per minute. Under optimum conditions, i.e., if there are no limitations due to limiting substrates or reagents, at each extension step, the amount of DNA target is doubled, leading to exponential (geometric) amplification of the specific DNA fragment. Final elongation: This single step is occasionally performed at a temperature of 70–74 °C for 5–15 minutes after the last PCR cycle to ensure that any remaining single-stranded DNA is fully extended. Final hold: This step at 4–15 °C for an indefinite time may be employed for short-term storage of the reaction.

To check whether the PCR generated the anticipated DNA fragment (also sometimes referred to as the amplimer or amplicon), agarose gel electrophoresis is employed for size separation of the PCR products. The size(s) of PCR products is determined by comparison with a DNA ladder (a molecular weight marker), which contains DNA fragments of known size, run on the gel alongside the PCR products.

Facts to be remembered

DNA Polymerases are enzymes that synthesize polynucleotide chains from nucleoside triphosphates and make the DNA. In 1865 Gregor Mendel's paper, Experiments on Plant Hybridization

In 1869, DNA was first isolated by the Swiss physician Friedrich Miescher who discovered a microscopic substance in the pus of discarded surgical bandages.

From 1880-1890 Walther Flemming, Eduard Strasburger, and Edouard van Beneden elucidate chromosome distribution during cell division

In 1889 Hugo de Vries postulates that "inheritance of specific traits in organisms comes in particles", naming such particles "(pan)genes"

In 1903 Walter Sutton hypothesizes that chromosomes, which segregate in a Mendelian fashion, are hereditary units

In 1905 William Bateson coins the term "genetics" in a letter to Adam Sedgwick and at a meeting in 1906

In 1908 Hardy-Weinberg law derived.

In 1910 Thomas Hunt Morgan shows that genes reside on chromosomes

In 1913 Alfred Sturtevant makes the first genetic map of a chromosome

In 1913 Gene maps show chromosomes containing linear arranged genes

In 1918 Ronald Fisher publishes "The Correlation Between Relatives on the Supposition of Mendelian Inheritance" the modern synthesis of genetics and evolutionary biology starts. See population genetics.

In 1928 Frederick Griffith discovers that hereditary material from dead bacteria can be incorporated into live bacteria (see Griffith's experiment)

in 1931 Crossing over is identified as the cause of recombination

In 1933 Jean Brachet is able to show that DNA is found in chromosomes and that RNA is present in the cytoplasm of all cells.

In 1937 William Astbury produced the first X-ray diffraction patterns that showed that DNA had a regular structure.

In 1928, Frederick Griffith discovered that traits of the "smooth" form of the Pneumococcus could be transferred to the "rough" form of the same bacteria by mixing killed "smooth" bacteria with the live "rough" form.

In 1952, Alfred Hershey and Martha Chase in the Hershey–Chase experiment showed that DNA is the genetic material of the T2 phage.

In 1953, James D. Watson and Francis Crick suggested double-helix model of DNA structure.

Purines are found in high concentration in meat and meat products, especially internal organs such as liver and kidney.

Examples of high-purine sources include: sweetbreads, anchovies, sardines, liver, beef kidneys, brains, meat extracts (e.g., Oxo, Bovril), herring, mackerel, scallops, game meats, beer (from the yeast) and gravy.

bp = base pair(s) One bp corresponds to circa 3.4 Å of length along the strand

kb (= kbp) = kilo base pairs = 1,000 bp

Mb = mega base pairs = 1,000,000 bp

Analysis of DNA topology uses three values:

L = linking number - the number of times one DNA strand wraps around the other. It is an integer for a closed loop and constant for a closed topological domain.

T = twist - total number of turns in the double stranded DNA helix. This will normally tend to approach the number of turns that a topologically open double stranded DNA helix makes free in solution: number of bases/10.5, assuming there are no intercalating agents (e.g., chloroquine) or other elements modifying the stiffness of the DNA.

W = writhe - number of turns of the double stranded DNA helix around the superhelical axis

L = T + W and ΔL = ΔT + ΔW

Any change of T in a closed topological domain must be balanced by a change in W, and vice versa. This results in higher order structure of DNA. A circular DNA molecule with a writhe of 0 will be circular. If the twist of this molecule is subsequently increased or decreased by supercoiling then the writhe will be appropriately altered, making the molecule undergo plectonemic or toroidal superhelical coiling. When the ends of a piece of double stranded helical DNA are joined so that it forms a circle the strands are topologically knotted. This means the single strands cannot be separated any process that does not involve breaking a strand (such as heating). The task of un-knotting topologically linked strands of DNA falls to enzymes known as topoisomerases. These enzymes are dedicated to un-knotting circular DNA by cleaving one or both strands so that another double or single stranded segment can pass through. This un-knotting is required for the replication of circular DNA and various types of recombination in linear DNA which have similar topological constraints.

Gb = giga base pairs = 1,000,000,000 bp.

1972 Development of recombinant DNA technology, which permits isolation of defined fragments of DNA; prior to this, the only accessible samples for sequencing were from bacteriophage or virus DNA. 1977 The first complete DNA genome to be sequenced is that of bacteriophage φX174. 1977 Allan Maxam and Walter Gilbert publish "DNA sequencing by chemical degradation". Frederick Sanger, independently, publishes "DNA sequencing with chain-terminating inhibitors". 1984 Medical Research Council scientists decipher the complete DNA sequence of the Epstein-Barr virus, 170 kb. 1986 Leroy E. Hood's laboratory at the California Institute of Technology and Smith announce the first semi-automated DNA sequencing machine. 1987 Applied Biosystems markets first automated sequencing machine, the model ABI 370. 1990 The U.S. National Institutes of Health (NIH) begins large-scale sequencing trials on Mycoplasma capricolum, Escherichia coli, Caenorhabditis elegans, and Saccharomyces cerevisiae (at US$0.75/base). 1991 Sequencing of human expressed sequence tags begins in Craig Venter's lab, an attempt to capture the coding fraction of the human genome. 1995 Craig Venter, Hamilton Smith, and colleagues at The Institute for Genomic Research (TIGR) publish the first complete genome of a free-living organism, the bacterium Haemophilus influenzae. The circular chromosome contains 1,830,137 bases and its publication in the journal Science marks the first use of whole-genome shotgun sequencing, eliminating the need for initial mapping efforts. 1996 Pål Nyrén and his student Mostafa Ronaghi at the Royal Institute of Technology in Stockholm publish their method of pyrosequencing. 1998 Phil Green and Brent Ewing of the University of Washington publish “phred” for sequencer data analysis. 2001 A draft sequence of the human genome is published. 2004 454 Life Sciences markets a parallelized version of pyrosequencing.The first version of their machine reduced sequencing costs 6-fold compared to automated Sanger sequencing, and was the second of a new generation of sequencing technologies, after MPSS

List of bases found in DNA and RNA

Name	Abbreviation	Classification	Found in
Cytosine	C	Pyrimidine	DNA, RNA
Thymine	T	Pyrimidine	DNA
Uracil	U	Pyrimidine	RNA
Adenine	A	Purine	DNA, RNA
Guanine	C	Purine	DNA, RNA

References

↑ DNA
↑ Griffith experiment
↑ Hershey–Chase experiment
↑ Hershey, A.D. and Chase, M. (1952) Independent functions of viral protein and nucleic acid in growth of bacteriophage. J Gen Physiol. 36:39–56.
↑ A very–MacLeod–McCarty experiment
↑ Base pair
↑ Pyrimidine
↑ Cytosine
↑ Nucleoside
↑ Nucleotide
↑ Phosphodiester bond
↑ A-DNA
↑ http://en.wikipedia.org/wiki/Z-DNA
↑ Noncoding DNA
↑ DNA supercoil
↑ Vologodskii AV, Lukashin AV, Anshelevich VV; et al. (1979). "Fluctuations in superhelical DNA". Nucleic Acids Res. 6: 967–682. doi:10.1093/nar/6.3.967. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: multiple names: authors list (link)
↑ H. S. Chawla (2002). Introduction to Plant Biotechnology. Science Publishers. ISBN 1578082285.
↑ Kayne PS, Kim UJ, Han M, Mullen JR, Yoshizaki F, Grunstein M. Extremely conserved histone H4 N terminus is dispensable for growth but essential for repressing the silent mating loci in yeast. Cell. 1988 Oct 7;55(1):27-39. PMID 3048701
↑ Crane-Robinson C, Dancy SE, Bradbury EM, Garel A, Kovacs AM, Champagne M, Daune M (1976). "Structural studies of chicken erythrocyte histone H5". Eur. J. Biochem. 67 (2): 379–88. doi:10.1111/j.1432-1033.1976.tb10702.x. PMID 964248. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
↑ Aviles FJ, Chapman GE, Kneale GG, Crane-Robinson C, Bradbury EM (1978). "The conformation of histone H5. Isolation and characterisation of the globular segment". Eur. J. Biochem. 88 (2): 363–71. doi:10.1111/j.1432-1033.1978.tb12457.x. PMID 689022. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
↑ http://en.wikipedia.org/wiki/DNA-binding_domain
↑ http://en.wikipedia.org/wiki/DNA-binding_domain
↑ http://en.wikipedia.org/wiki/DNA_sequencing
↑ http://en.wikipedia.org/wiki/DNA_sequencing
↑ DNA sequencing
↑ Smith LM, Sanders JZ, Kaiser RJ; et al. (1986). "Fluorescence detection in automated DNA sequence analysis". Nature. 321 (6071): 674–9. doi:10.1038/321674a0. PMID 3713851. We have developed a method for the partial automation of DNA sequence analysis. Fluorescence detection of the DNA fragments is accomplished by means of a fluorophore covalently attached to the oligonucleotide primer used in enzymatic DNA sequence analysis. A different coloured fluorophore is used for each of the reactions specific for the bases A, C, G and T. The reaction mixtures are combined and co-electrophoresed down a single polyacrylamide gel tube, the separated fluorescent bands of DNA are detected near the bottom of the tube, and the sequence information is acquired directly by computer. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: multiple names: authors list (link)
↑ Smith LM, Fung S, Hunkapiller MW, Hunkapiller TJ, Hood LE (1985). "The synthesis of oligonucleotides containing an aliphatic amino group at the 5' terminus: synthesis of fluorescent DNA primers for use in DNA sequence analysis". Nucleic Acids Res. 13 (7): 2399–412. doi:10.1093/nar/13.7.2399. PMC 341163. PMID 4000959. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
↑ "Phred - Quality Base Calling". Retrieved 2011-02-24.
↑ "Base-calling for next-generation sequencing platforms — Brief Bioinform". Retrieved 2011-02-24.
↑ Murphy, K.; Berg, K.; Eshleman, J. (2005). "Sequencing of genomic DNA by combined amplification and cycle sequencing reaction". Clinical chemistry 51 (1): 35–39.
↑ Sengupta, D.; Cookson, B. (2010). "SeqSharp: A general approach for improving cycle-sequencing that facilitates a robust one-step combined amplification and sequencing method". The Journal of molecular diagnostics : JMD 12 (3): 272–277.
↑ Polymerase chain reaction

[1] DNA

[2] Griffith experiment

[3] Hershey–Chase experiment

[4] Hershey, A.D. and Chase, M. (1952) Independent functions of viral protein and nucleic acid in growth of bacteriophage. J Gen Physiol. 36:39–56.

[5] A very–MacLeod–McCarty experiment

[6] Base pair

[7] Pyrimidine

[8] Cytosine

[9] Nucleoside

[10] Nucleotide

[11] Phosphodiester bond

[12] A-DNA

[13] ttp://en.wikipedia.org/wiki/Z-DNA

[14] Noncoding DNA

[15] DNA supercoil

[Vologodskii1979-16] Vologodskii AV, Lukashin AV, Anshelevich VV; et al. (1979). "Fluctuations in superhelical DNA". Nucleic Acids Res. 6: 967–682. doi:10.1093/nar/6.3.967. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: multiple names: authors list (link)

[17] H. S. Chawla (2002). Introduction to Plant Biotechnology. Science Publishers. ISBN 1578082285.

[18] Kayne PS, Kim UJ, Han M, Mullen JR, Yoshizaki F, Grunstein M. Extremely conserved histone H4 N terminus is dispensable for growth but essential for repressing the silent mating loci in yeast. Cell. 1988 Oct 7;55(1):27-39. PMID 3048701

[pmid964248-19] Crane-Robinson C, Dancy SE, Bradbury EM, Garel A, Kovacs AM, Champagne M, Daune M (1976). "Structural studies of chicken erythrocyte histone H5". Eur. J. Biochem. 67 (2): 379–88. doi:10.1111/j.1432-1033.1976.tb10702.x. PMID 964248. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)

[pmid689022-20] Aviles FJ, Chapman GE, Kneale GG, Crane-Robinson C, Bradbury EM (1978). "The conformation of histone H5. Isolation and characterisation of the globular segment". Eur. J. Biochem. 88 (2): 363–71. doi:10.1111/j.1432-1033.1978.tb12457.x. PMID 689022. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)

[21] ttp://en.wikipedia.org/wiki/DNA-binding_domain

[22] ttp://en.wikipedia.org/wiki/DNA-binding_domain

[23] ttp://en.wikipedia.org/wiki/DNA_sequencing

[24] ttp://en.wikipedia.org/wiki/DNA_sequencing

[25] DNA sequencing

[26] Smith LM, Sanders JZ, Kaiser RJ; et al. (1986). "Fluorescence detection in automated DNA sequence analysis". Nature. 321 (6071): 674–9. doi:10.1038/321674a0. PMID 3713851. We have developed a method for the partial automation of DNA sequence analysis. Fluorescence detection of the DNA fragments is accomplished by means of a fluorophore covalently attached to the oligonucleotide primer used in enzymatic DNA sequence analysis. A different coloured fluorophore is used for each of the reactions specific for the bases A, C, G and T. The reaction mixtures are combined and co-electrophoresed down a single polyacrylamide gel tube, the separated fluorescent bands of DNA are detected near the bottom of the tube, and the sequence information is acquired directly by computer. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: multiple names: authors list (link)

[27] Smith LM, Fung S, Hunkapiller MW, Hunkapiller TJ, Hood LE (1985). "The synthesis of oligonucleotides containing an aliphatic amino group at the 5' terminus: synthesis of fluorescent DNA primers for use in DNA sequence analysis". Nucleic Acids Res. 13 (7): 2399–412. doi:10.1093/nar/13.7.2399. PMC 341163. PMID 4000959. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)

[urlPhred_-_Quality_Base_Calling-28] "Phred - Quality Base Calling". Retrieved 2011-02-24.

[urlBase-calling_for_next-generation_sequencing_platforms_—_Brief_Bioinform-29] "Base-calling for next-generation sequencing platforms — Brief Bioinform". Retrieved 2011-02-24.

[30] Murphy, K.; Berg, K.; Eshleman, J. (2005). "Sequencing of genomic DNA by combined amplification and cycle sequencing reaction". Clinical chemistry 51 (1): 35–39.

[31] Sengupta, D.; Cookson, B. (2010). "SeqSharp: A general approach for improving cycle-sequencing that facilitates a robust one-step combined amplification and sequencing method". The Journal of molecular diagnostics : JMD 12 (3): 272–277.

[32] Polymerase chain reaction

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]