Structural Biochemistry/Genome Analysis/Sequenced Genomes
Due to modern techniques of DNA analysis, many genomes have been sequenced and analyzed. A famous example is the human genome through the Human Genome Project.
Human Genome Project
The human genome project was an international scientific research effort to fully map out the human genome. This project was started by James D. Watson at the US Institute of Health, but research centers worked on the project all over the world; such as France, Germany, Japan, China, the United Kingdom, and India. So far about 92.3% of the genome has been sequenced, but its difficult to determine due to non-coding sequences of DNA or "junk" DNA.
The genome project uncovered some key findings such as the genome of the human race is 99.9% alike.
Homology
Sequencing genomes allow scientists to identify homologous proteins and establish evolutionary relationships. Furthermore if a newly discovered protein is homologous to a known protein, through homology scientists can make an educated guess on how the new protein functions.
The Impact of Sequencing on Medicine
The ability to quickly sequence the human genome in the future may have significant impacts on medicine. Knowledge about genes and an individual's DNA have already given scientists a way to predict the likelihood of certain diseases among individuals. This also allows one to analyze the chromosomal structure, the effects of evolution upon the genome, and protein structures and functions. In the future, gene therapy, genomic medicine, and preventative treatments may reduce the likelihood of disease and allow manufacturers to tailor drugs to specific individuals.
Sequenced Eukaryotic Genomes
Eukaryotes are organisms containing cells that enclose complex organelles within a well-defined cell membrane. The defining characteristic that sets Eukaryotes and Prokaryotes apart is Eukaryotes' nucleus, or nuclear envelope, in which an organism's genetic information is contained.
The first eukaryotic genome to be sequenced is that of Saccharomyces cerevisiae (S. cerevisiae) in 1996, and it is commonly known as brewer's yeast. S. cerevisiae is the most useful type of yeast due to its utility in baking and brewing, so it is the most studied eukaryotic model organisms in molecular and cell biology, similar to E. coli's role in the study of prokayortic organisms. Many proteins that are important to humans are studied by examining their homologs in yeasts. For example, signaling proteins and protein-processing enzymes are all discovered through the help of yeast genome.
Other fully sequenced organisms include: roundworm, fruitfly, pufferfish (first vertebrate to be sequenced after humans), and Arabidopsis thaliana.
The tables from below are taken from Wikipedia's list of sequenced eukaryotic genomes.
Protists
Chromista
The Chromista are a group of protists that contains the algal phyla Heterokontophyta, Haptophyta and Cryptophyta. Members of this group are mostly studied for evolutionary interest.
| Organism | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion |
|---|---|---|---|---|---|---|
| Guillardia theta | Cryptomonad | Model organism | 0.551 Mb (nucleomorph genome only) |
464[1] | Canadian Institute of Advanced Research, Philipps-University Marburg and the University of British Columbia | 2001[1] |
| Thalassiosira pseudonana Strain:CCMP 1335 |
Diatom | 2.5 Mb | 11,242[2] | Joint Genome Institute and the University of Washington | 2004[2] | |
| Phaeodactylum tricornutum Strain: CCAP1055/1 |
Diatom | 27.4 Mb | 10,402 | Joint Genome Institute | 2008 [3] |
Alveolata
Alveolata are a group of protists which includes the Ciliophora, Apicomplexa and Dinoflagellata. Members of this group are of particular interest to science as the cause of serious human and livestock diseases.
| Organism | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion |
|---|---|---|---|---|---|---|
| Babesia bovis | Parasitic protozoan | Cattle pathogen | 8.2 Mb | 3,671 | 2007[4] | |
| Cryptosporidium hominis Strain:TU502 |
Parasitic protozoan | Human pathogen | 10.4 Mb | 3,994[5] | Virginia Commonwealth University | 2004[5] |
| Cryptosporidium parvum C- or genotype 2 isolate |
Parasitic protozoan | Human pathogen | 16.5 Mb | 3,807[6] | UCSF and University of Minnesota | 2004[6] |
| Paramecium tetraurelia | Ciliate | Model organism | 72 Mb | 39,642[7] | Genoscope | 2006[7] |
| Plasmodium falciparum Clone:3D7 |
Parasitic protozoan | Human pathogen (malaria) | 22.9 Mb | 5,268[8] | Malaria Genome Project Consortium | 2002[8] |
| Plasmodium knowlesi | Parasitic protozoan | Primate pathogen (malaria) | 23.5 Mb | 5,188[9] | 2008[9] | |
| Plasmodium vivax | Parasitic protozoan | Human pathogen (malaria) | 26.8 Mb | 5,433[10] | 2008[10] | |
| Plasmodium yoelii yoelii Strain:17XNL |
Parasitic protozoan | Rodent pathogen (malaria) | 23.1 Mb | 5,878[11] | TIGR and NMRC | 2002[11] |
| Tetrahymena thermophila | Ciliate | Model organism | 104 Mb | 27,000[12] | 2006[12] | |
| Theileria parva Strain:Muguga |
Parasitic protozoan | Cattle pathogen (African east coast fever) | 8.3 Mb | 4,035[13] | TIGR and the International Livestock Research Institute | 2005[13] |
| Theileria annulata Ankara clone C9 |
Parasitic protozoan | Cattle pathogen | 8.3 Mb | 3,792 | Sanger | 2005[14] |
Excavata
Excavata is a group of related free living and symbiotic protists; it includes the Metamonada, Loukozoa, Euglenozoa and Percolozoa. They are researched for their role in human disease.
| Organism | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion |
|---|---|---|---|---|---|---|
| Leishmania major Strain:Friedlin |
Parasitic protozoan | Human pathogen | 32.8 Mb | 8,272[15] | Sanger Institute | 2005[15] |
| Giardia lamblia | Parasitic protozoan | Human pathogen | 11.7 Mb | 6,470[16] | 2007[16] | |
| Trichomonas vaginalis | Parasitic protozoan | Human pathogen (Trichomoniasis) | 160 Mb | 59,681[17] | TIGR | 2007[17] |
| Trypanosoma brucei Strain:TREU927/4 GUTat10.1 |
Parasitic protozoan | Human pathogen (Sleeping sickness) | 26 Mb | 9,068 [18] | Sanger Institute and TIGR | 2005[18] |
| Trypanosoma cruzi Strain:CL Brener TC3 |
Parasitic protozoan | Human pathogen (Chagas disease) | 34 Mb | 22,570[19] | TIGR, Seattle Biomedical Research Institute and Uppsala University | 2005[19] |
Amoebozoa
Amoebozoa are a group of motile amoeboid protists, members of this group move or feed by means of temporary projections, called pseudopods. The best known member of this group is the slime mold which has been studied for centuries; other members include the Archamoebae, Tubulinea and Flabellinea. Some Amoeboza cause disease.
| Organism | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion |
|---|---|---|---|---|---|---|
| Dictyostelium discoideum Strain:AX4 |
Slime mold | Model organism | 34 Mb | 12,500[20] | Consortium from University of Cologne, Baylor College of Medicine and the Sanger Centre | 2005[20] |
| Entamoeba histolytica HM1:IMSS |
Parasitic protozoan | Human pathogen (amoebic dysentery) | 23.8 Mb | 9,938[21] | TIGR, Sanger Institute and the London School of Hygiene and Tropical Medicine | 2005[21] |
Plants
Higher plants
| Organism | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion |
|---|---|---|---|---|---|---|
| Arabidopsis thaliana Ecotype:Columbia |
Wild mustard | Model plant | 120 Mb | 25,498[22] | Arabidopsis Genome Initiative[23] | 2000[22] |
| Brassica napus | Rapeseed | Oil plant | 1,100 Mb | Bayer CropScience | 2009[24] | |
| Oryza sativa ssp indica |
Rice | Crop and model organism | 420 Mb | 32-50,000[25] | Beijing Genomics Institute, Zhejiang University and the Chinese Academy of Sciences | 2002[25] |
| Oryza sativa ssp japonica |
Rice | Crop and model organism | 466 Mb | 46,022-55,615[26] | Syngenta and Myriad Genetics | 2002[26] |
| Ostreococcus tauri | Green alga | Simple eukaryote | 12.6 Mb | Laboratoire Arago | 2006[27] | |
| Physcomitrella patens | Bryophyte | Model organism
early diverging land plant |
500 Mb | 39,458[28] | US Department of Energy Office of Science Joint Genome Institute | 2008[28] |
| Populus trichocarpa | Balsam poplar or Black Cottonwood | Carbon sequestration, model tree, commercial use (timber), and comparison to A. thaliana | 550 Mb | 45,555[29] | The International Poplar Genome Consortium | 2006[29] |
| Vitis vinifera | Grapevine PN40024 | Fruit crop | 490 Mb[30] | 30,434[30] | The French-Italian Public Consortium for Grapevine Genome Characterization | 2007[30] |
| Zea mays ssp mays |
Corn (maize) | Fruit crop | 2,800 Mb | 50,000-60,000 | NSF | 2008[31] |
Algae
| Organism | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion |
|---|---|---|---|---|---|---|
| Cyanidioschyzon merolae Strain:10D |
Red alga | Simple eukaryote | 16.5 Mb | 5,331[32] | University of Tokyo, Rikkyo University, Saitama University and Kumamoto University | 2004[32] |
| Thalassiosira pseudonoana[33] | Heterokont | |||||
| Chlamydomonas reinhardtii[34] | Model organism | 2007[34] | ||||
| Ostreococcus tauri[33] | Chlorophyte |
Fungi
| Organism | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion |
|---|---|---|---|---|---|---|
| Ashbya gossypii Strain:ATCC 10895 |
Fungus | Plant pathogen | 9.2 Mb | 4,718[35] | SyngentaAG and University of Basel | 2004[35] |
| Aspergillus fumigatus Strain:Af293 |
Fungus | Human pathogen | 29.4 Mb | 9,926[36] | Sanger Institute, University of Manchester, TIGR, Institut Pasteur, Nagasaki University, University of Salamanca and OpGen | 2005[36] |
| Aspergillus nidulans Strain:FGSC A4 |
Fungus | Model organism | 30 Mb | 9,500[37] | 2005[37] | |
| Aspergillus niger Strain:CBS 513.88 |
Fungus | Biotechnology - fermentation | 33.9 Mb | 14,165[38] | 2007[38] | |
| Aspergillus oryzae Strain:RIB40 |
Fungus | Used to ferment soy | 37 Mb | 12,074[39] | National Institute of Technology and Evaluation | 2005[39] |
| Candida glabrata Strain:CBS138 |
Fungus | Human pathogen | 12.3 Mb | 5,283[40] | Génolevures Consortium [41] | 2004[40] |
| Cryptococcus (Filobasidiella) neoformans JEC21 |
Fungus | Human pathogen | 20 Mb | 6,500[42] | TIGR and Stanford University | 2005[42] |
| Debaryomyces hansenii Strain:CBS767 |
Yeast | Cheese ripening | 12.2 Mb | 6,906[40] | Génolevures Consortium | 2004[40] |
| Encephalitozoon cuniculi | Microsporidium | Human pathogen | 2.9 Mb | 1,997[43] | Genoscope and Université Blaise Pascal | 2001[43] |
| Kluyveromyces lactis Strain:CLIB210 |
Yeast | 10-12 Mb | 5,329[40] | Génolevures Consortium | 2004[40] | |
| Magnaporthe grisea | Fungus | Plant pathogen | 37.8 Mb | 11,109[44] | 2005[44] | |
| Neurospora crassa | Fungus | Model eukaryote | 40 Mb | 10,082[37] | Broad Institute, Oregon Health and Science University, University of Kentucky, and the University of Kansas | 2003[37] |
| Saccharomyces cerevisiae Strain:S288C |
Baker's yeast | Model eukaryote | 12.1 Mb | 6,294[45] | International Collaboration for the Yeast Genome Sequencing[46] | 1996[45] |
| Schizosaccharomyces pombe Strain:972h |
Yeast | Model eukaryote | 14 Mb | 4,824[47] | Sanger Institute and Cold Spring Harbor Laboratory | 2002[47] |
| Yarrowia lipolytica Strain:CLIB99 |
Yeast | Industrial uses | 20 Mb | 6,703[40] | Génolevures Consortium | 2004[40] |
Animals
Mammals
| Organism | Type | Shotgun Coverage | Genome size | Number of genes predicted | Organization | Year of completion |
|---|---|---|---|---|---|---|
| Bos taurus | Cow | 6* | 3.0 Gb[48][49] | 22000[50] | Cattle Genome Sequencing International Consortium | 2009 |
| Canis lupus familiaris | Dog | 7.6* | 2.4 Gb[51] | 19,300[51] | Broad Institute and Agencourt Bioscience | 2005[51] |
| Cavia porcellus | Guinea Pig | 2* | 3.4 Gb | The Genome Sequencing Platform, The Genome Assembly Team[49] | ||
| Dasypus novemcinctus | Nine-banded Armadillo | 2* [52] | 3.0 Gb | Broad Institute[49] | ||
| Echinops telfairi | Hedgehog-Tenrec | 2* [52] | Broad Institute | |||
| Equus caballus | Horse | 6.8* | 2.1 Gb [49] | Broad Institute et al.[49] | 2007 [53] | |
| Erinaceus europaeus | Western European Hedgehog | 2* [52] | Broad Institute | |||
| Felis catus | Cat | 2* | 3 Gb | 20,285 | The Genome Sequencing Platform, The Genome Assembly Team[49] | 2007[54] |
| Homo sapiens | Human | 3.2 Gb [55] | 25,000[55] | Human Genome Project Consortium and Celera Genomics | Draft 2001[56][57] Complete 2006[58] |
|
| Loxodonta africana | African Elephant | 2* [52] | 3 Gb | Broad Institute | ||
| Macaca mulatta | Rhesus Macaque | 6* | Macaque Genome Sequencing Consortium[49] | |||
| Microcebus murinus | Gray Mouse Lemur | 2* [52] | The Genome Sequencing Platform, The Genome Assembly Team[49] | |||
| Monodelphis domestica | Gray Short-tailed Opossum | 3.5 Gb | 18 - 20,000 | Broad Institute et al. | 2007[49][59] | |
| Mus musculus Strain: C57BL/6J |
Mouse | 2.5 Gb | 24,174[60] | International Collaboration for the Mouse Genome Sequencing[61] | 2002[60] | |
| Myotis lucifugus | Little Brown Bat | 2* [49] | Broad Institute | |||
| Ochotona princeps | American Pika | 2* [52] | Broad Institute | |||
| Ornithorhynchus anatinus [62] | Platypus | 6* [49] | Washington University | |||
| Oryctolagus cuniculus | Rabbit | 2* [52] | 2.5 Gb | Broad Institute et al. [49] | ||
| Otolemur garnettii | Small-eared Galago, or Bushbaby | 2* [52] | Broad Institute | |||
| Pan troglodytes | Chimpanzee | 6* [49] | 3.1 Gb | Chimpanzee Sequencing and Analysis Consortium | 2005[63] | |
| Pongo pygmaeus | Orangutan | 3.0 Gb | Institute for Molecular Biotechnology [49] | |||
| Rattus norvegicus | Rat | 1.8* or better | 2.8 Gb [49] | 21,166[64] | Rat Genome Sequencing Project Consortium | 2004[64] |
| Sorex araneus | European Shrew | 2* [52] | 3.0 Gb [49] | The Genome Sequencing Platform, The Genome Assembly Team[49] | ||
| Spermophilus tridecemlineatus | Thirteen-lined Ground Squirrel | 2* | The Genome Sequencing Platform, The Genome Assembly Team[49] | |||
| Tupaia belangeri | Northern Tree Shrew | 2* | Broad Institute[49] |
Insects
| Organism | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion |
|---|---|---|---|---|---|---|
| Anopheles gambiae Strain: PEST |
Mosquito | Vector of malaria | 278 Mb | 13,683[65] | Celera Genomics and Genoscope | 2002[65] |
| Apis mellifera | Honey bee | Model for eusocial behavior | 1800 Mb | 10,157[66] | The Honeybee Genome Sequencing Consortium | 2006[66] |
| Bombyx mori Strain:p50T |
Moth (domestic silk worm) | Silk production | 530 Mb | University of Tokyo and National Institute of Agrobiological Sciences | 2004[67] | |
| Drosophila melanogaster | Fruit fly | Model animal | 165 Mb | 13,600[68] | Celera, UC Berkeley, Baylor College of Medicine, European DGP | 2000[68] |
Nematodes
| Organism | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion |
|---|---|---|---|---|---|---|
| Caenorhabditis briggsae | Nematode worm | For comparison with C. elegans | 104 Mb | 19,500[69] | Washington University, Sanger Institute and Cold Spring Harbor Laboratory | 2003[69] |
| Caenorhabditis elegans Strain:Bristol N2 |
Nematode worm | Model animal | 100 Mb | 19,000[70] | Washington University and the Sanger Institute | 1998[70] |
| Meloidogyne hapla | Northern root-knot nematode | Vegetable pathogen | 54 Mb | 14,420[71] | 2008[71] | |
| Meloidogyne incognita | Southern root-knot nematode | Plant pathogen | 86 Mb | 19,212[72] | INRA, Genoscope and International M.incognita Genome Consortium[73] | 2008[72] |
| Pristionchus pacificus | Nematode worm | Model invertebrate | 169 Mb | 23,500[74] | Max-Planck Institute for Developmental Biology &
Genome Sequencing Center, Washington University School of Medicine |
2008[74] |
Other animals
| Organism | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion |
|---|---|---|---|---|---|---|
| Ciona intestinalis | Tunicate | Simple chordate | 116.7 Mb | 16,000[75] | Joint Genome Institute | 2003[75] |
| Ciona savignyi | Tunicate | 174 Mb | Broad Institute | 2007[76] | ||
| Gallus gallus | Chicken | 1000 Mb | 20-23,000[77] | International Chicken Genome Sequencing Consortium | 2004[77] | |
| Strongylocentrotus purpuratus | Sea urchin | Model eukaryote | 814 Mb | 23,300[78] | Sea Urchin Genome Sequencing Consortium | 2006[78] |
| Takifugu rubripes | Puffer fish | Vertebrate with small genome | 390 Mb | 22-29,000[79] | International Fugu Genome Consortium[80] | 2002[81] |
| Tetraodon nigroviridis | Puffer fish | Vertebrate with compact genome | 340 Mb[82] | 22,400[82] | Genoscope and the Broad Institute | 2004[82] |
Sequenced Bacterial Genomes
There are some techniques which are improving to be fast and high volume DNA sequencing like fluorescent dideoxynucleotide chain terminators, "shot gun" method etc. The bacterial genome of Haemophilus influenza wa determined in 1995 with a "short gun" method. The genomic DNA is cut randomly into fragments and then the computer programs brings out the whole sequence by matching the overlapping regions between these fragments. The H. influenzae genome consists of 1,830,137 base pairs and encodes approximately 1740 proteins. With these similar approaches, more than 100 bacterial and archaeal species including key model of organisms such as E.coli, Salmonella typhimurium, and Archaeoglobus fulgidus, as well as pathogenic organisms such as Yersina pestis (causing bubonic plague) and Bacillus anthracis (anthrax).1
References
1. Berg, Jeremy M. 2007. Biochemistry. Sixth Ed. New York: W.H. Freeman. 68-69, 78. 2. Voet, Voet, Pratt (2004). - Fundamentals of Biochemistry