Structural Biochemistry/Bioinformatics/Comparative Bioinformatics

Comparative GenomicsEdit

Nowadays, there is a vast numbers of genomes were sequenced. It is stunning how similar the genome of other species to human. For example, Drosophila (fruit fly) has more than half of the genes have human counterparts, even though the species does not look “human” at all. The result is even more stunning when scientists compared human’s genes with mammals.

Information about different genomes requires sciences to have a new field: comparative genomics. Comparative genomics studies the relationship between genomes of different species and tries to discover more genomes. Moreover, this field also attempts to answer many evolutionary questions. Recently, the draft sequence of chimp, out closest living relative, has been completed. By comparing the genome of chimp and human, scientists may have the answer of how we evolve from chimp in a biological aspect.

Example: Fruit fly - genome


Processes of Comparing GenomesEdit

Comparative genomics works by aligning sequences [1] of different organisms to identify patterns that operate over both large and small distances. Aligning mouse chromosomes with human chromosomes, for example, shows that 99% of our protein-coding genes align with homologous sequences in mice. Underlying such analyses is the principle that DNA sequences that are highly conserved are likely to be functionally important. A common assumption is that adding more comparative genomes to the alignment helps distinguish functionally significant from irrelevant conserved sequences.

If a desired genome is unsequenced, it would take a long time to completely sequence species’ genome from beginning, and then compare it to others. However, science can make use of its relative species to finish this task quicker and easier. In order to compare unsequenced genomes, genome science uses shared synteny, a conserved arrangements of DNA on chromosomes of related species. Take a look at wheat for an examples, wheat has even more genomes than human; therefore, sequencing it would be too abundance.

Benefits of Comparitive BioinformaticsEdit

1. Helps our understanding of the genetic basis of diseases in both animals and humans.
2. Increases our basic knowledge of the evolutionary pathways of related species.
3. Helps find new medical treatments and other means of benefiting human health.
4. To determine the function of human genes → for example, researchers can look for genes in humans in other animals whose functions are known. If scientists have identified a particular gene in another animal and know what it does, a gene in a human with a similar sequence probably has a similar function as that in the animal. This is called "annotating" - defined as creating a set of comments, notations, and references describing the experimental and inferred information about a gene or protein.


1. Chloroplast genome and mitochondrial genome -- Chloroplast and mitochondria also contains a significant amount of genetic information for a cell. Therefore, it should also be considered in the studies of genomics. Chloroplast is the organelle in plant that functions in photosynthesis. However, it also possesses its own genome, and thus can be independent of the cell in replication. Compared with the DNA in the nucleus, DNA in chloroplast does not evolve or evolves extremely slowly. Moreover, this DNA also cannot be modified or mutated over generation since recombination does not happen. As a result, scientists can easily provide the detail about evolution. It was found that genetic exchange had occurred between the chloroplast and nucleus. Some of the proteins are also encoded from the chloroplast genome. Similar to chloroplast genome, mitochondrial genome does not change significantly overtime. Mitochondrion in a cell is made by both its nucleus and mitochondria DNA. This shows the interaction between the two genomes, the nucleus’ and the mitochondria’s.
2. Fruit flies -- Although fruit flies have a genome that is 25 times smaller than the human genome, many of the flies' genes are similar to those in humans and control the same biological functions. Research on fruit flies has led to discoveries on the influence of genes on diseases, animal development, population genetics, cell biology, neurobiology, behavior, physiology and evolution.
They are also used for Parkinson's research. Researchers have found that two-thirds of human genes known to be involved in cancer have counterparts in the fruit fly. When scientists put a human gene associated with arkinson's disease into fruit flies, they displayed symptoms that humans have. This might mean that they could serve as a new model for finding a cure for Parkinson's.

DNA as an ID tool if do not know what type of organism it is. Take what species DNA that one would like to sequence and then first look on Genbank- which is the public bank of DNA to see if it has sequences before and since that is public knowledge. It is also somewhat of a Taxonomy browser. If for example seahorses are to be sequenced to see if a certain species at an aquarium is a certain one, the sequences that are available are the Cytochrom B sequence. Just need to sequence the DNA at the cytochrome area of the genome. In order to do this, must amplify that region- the purpose is in order to make millions of copies of that region. Take the primers that will complement that region and the people who have already sequenced that region will already have specified primers that can be used to determine the sequence with. All sorts of primers may be used even ones that have recently died just ones without any smell. Take the DNA that was sequenced and then compare with the DNA from the bank. Input what was sequenced and ask the search engine for the best match. The results will show species that have the highest similarity to the one that was sequence and give the best match.

Comparative Eukaryotic Genomics
Organism Estimated Genome Size (MB) Estimated Number of Genes Year Sequenced
Human 2,900 20,000-25,000 2001
Mouse 2,600 30,000 2002
Pufferfish 365 33,609 2002
Rat 2,750 20,973 2004
Chimpanzee 3,100 20,000-25,000 2005
Red Jungle Fowl 1,000 20,000-23,000 2004
Fruit Fly 137 13,600 2000
Mosquito 278 46,000-56,000 2002
Fission Yeast 13.8 4,824 2002
Brewer's Yeast 12.7 5,805 1997
Protist 23 5,300 2002
Wall Cress 125 25,498 2000
Rice 430 41,000 2002
(Reference: Biology, Eighth Edition by Raven and Johnson)

Comparison of human genome with other species':

1. Human vs. Pufferfish: Pufferfish was the first vertebrate that has its sequence compared with human. The latest shared ancestor of humans and pufferfish was 450 million years ago. However, their genomes still have many some similarities. Only about ¼ of human genes have no counterparts in pufferfish. However, 97% human DNA is repetitive while the number is only about 17% in pufferfish.

2. Human vs. Mouse: This is the first genome comparison made between two mammals. The similarity is much more significant. Both humans and mouse have about 25000 genes; and surprisingly, they share 99% genome. Human and mouse have the same ancestor about 75 millions years ago, much shorter compared with human-pufferfish ancestor. However, it was found that mouse DNA mutated two times faster than human’s DNA. That created 300 genes unique (only 1%) to both organism; and human genome has 400 million more nucleotides than mouse’s.

3. Human vs. Chimpanzee: Chimpanzee is one of the closest relative to human. We shared a common ancestor only about 35 millions years ago. In 2005, the genome of chimpanzee was completely sequenced and compared with humans’ genome. Only 1.06% difference in substitution and 1.5% difference in insertion and deletion were detected. Those insertion and deletion may provide us distinct characteristics from chimps, including lack of body hair and larger cranium.

4. Human vs. Plants: Estimated 1/3 of the genes in plants are not found in mammals. Those genes encoded plants’ distinguished characteristics such as photosynthesis and photosynthetic anatomy. The other 2/3 is very similar to human and animal genome. The similar genes encoded for basic metabolism, genome replication and repair, RNA transcription and protein synthesis.