Proteomics/Introduction to Proteomics/Bioinformatics

Bioinformatics in Proteomics edit

A cabinet from the IBM Blue Gene L supercomputer

As with other subsets of biology, an increased ability to generate large amounts of data from the use of high throughput methods has led to an increased reliance on computers for data acquisition, storage, and analysis. The internet has also enabled collaboration and sharing of data that would have previously not been possible, leading to the development of large public databases with contributors all over the world. Many databases exist for protein-related information, such as the Protein Data Bank (PDB) which handles structure and sequence information for proteins with a determined crystal structure. Expasy is a popular and well-curated resource for proteomics databases and tools, including resources such as the Prosite protein feature and domain database, protein BLAST (Basic Local Alignment and Search Tool, for similarity searching), and structure prediction. NCBI also provides many resources for many types of data, including proteins, which are all searchable and well integrated.

As with other bioinformatics resources, "in-silico" discovery is not meant as a replacement for lab techniques, but rather as a supplement to work done in a wet lab. For example, if a protein thought to be a transmembrane protein was analyzed with a sequence-based localization tool that agreed with the hypothesis, it would probably still be worth experimentally confirming before drawing a conclusion. However, bioinformatics tools can be extremely useful time savers, and can provide a possible place to start with experimentation, narrow down a problem domain, or provide potential solutions to problems which would be very difficult or impossible to determine experimentally, such as with protein folding. Protein folding has become a benchmark application for many supercomputers and distributed computing systems, such as IBM's BlueGene ^[1] and Stanford's Folding@Home project. Distributed computing makes use of many independent client nodes that connect to a master server to obtain data to process and send back results, making them well suited to use over LANs and the internet. Projects like Grid.org and Folding@Home can be run by anyone on their own computers, and currently have thousands of participants. Although folding for now is not a replacement for structure determination by crystallography, it can provide a reasonable estimate of structure which can be investigated until actual structure is elucidated.

[1]