Structural Biochemistry/Bioinformatics/Protein Fold Recognition

Overview

One of the biggest problems in bioinformatics is the relationship between amino acid sequence, structure, and function of proteins. The three dimensional structure of proteins helps in the development of drugs, engineering of enzymes, and analysis of protein functions.

There is the much-debated controversy of structural space and the conservation of structures and their relationship to the homology of proteins. The prediction of tertiary structures, however, relies on the re-use of subjective fragments without the need for homologous sequences (between target sequence and fragment source).

One of the many methods used by biologists and researchers to predict the tertiary structure of proteins is the GenTHREADER. This method aids in the detection of protein templates and sequence-structure alignment accuracy. Most methods used for fold prediction use known protein structures as a basis of the scoring of alignments calculated among protein sequence profiles. Two versions of GenTHREADER are the pGenTHREADER and pDomTHREADER methods, which are used to recognize and align protein sequences in order to analyze their relationship to the structure and function of the protein. Both of these versions have similar inputs of protein sequence profiles and structural information; the use one core alignment algorithm.

The pGenTHREADER uses profile-profile comparisons from matrices built from sequences, coiled coil regions and filtered trans-membrane fragments. The final step includes two sequence-profile and profile-sequence scores, which enabled profile-profile matches that was higher scoring that other matches. In addition, a hydrophobic burial terms is added that serves the purpose of biasing alignments’ positions in a target sequence.

In determining results, the best results of a template-target pair were used to calculate the number of equivalent residues rather than the results for a method; this differentiates alignment accuracy from template selection. Picking the best result significantly improves performance of chains less than 200 amino acids but shows little improvements in longer chains. pGenTHREADER works significantly better than other methods when the fold recognition relationships are more distant; with shorter distances, other methods tend to show more improvement. These other methods used mixes of fold recognition, side chain optimizations and model quality assessments.