Structural Biochemistry/Advances and pitfalls of protein structural alignment

An interesting look at a paper by Hitomi Hasegawa and Liisa Holm:

The advances and pitfalls of protein structural alignment

Small Introduction

It is apparent that structural comparison can open a window into the past as proteins taken part in a sort of evolution process. Strictly using protein sequence comparison has provided limited information and sometimes generate contradictory information for even slightly distant structures. This has opened the eyes of scientists and researchers to engage in structure comparison methods that allow for more flexible procedures in terms of generating the most biologically meaningful alignments.It is indeed common knowledge that analysis of protein sequences and structures have paved the way for a higher understanding of proteins and their important functions. However, what is important to note is the fact that the variety of protein structures is far less than that of protein sequences, which is mainly due to physical limitations of natural proteins. The foundation for structural proteins has been placed by the concept of visual analysis through the illustration of known structures of proteins. Though there may not be a universal code as to when a protein is structurally similar, we all do know when we see it. The generation of three-dimensional structures has made visualizing proteins far easier. As a matter of fact, the most widespread use of structural alignments is the ability to identify homologous residues that are encoded by the same codon in the genome of a common ancestor. The field of protein structural alignment has maintained its active nature through the years and the number of new methods has actually doubled every five years.

Advances and Technological Development

In terms of scores, different scoring themes can be classified into different types depending on whether one is dealing with a three-dimensional, two-dimensional, or one-dimensional structure. When it comes to 3D structures, similarities are drawn from "positional deviations of equivalent atoms upon rigid-body superimposition." The balance between the size of the common core and gap penalties can define sets of optimal configurations. Flexible aligners come into play and serve as the chief identifiers of proteins with large conformational changes by chaining together a series of substructures within the particular protein. When 2D structures are compared, the similarities are drawn from the relative distance differences of "intramolecular C(alpha)-C(alpha) distances." In words we can understand, this essentially means that for the same level of similarity, larger deviations are allowed for tertiary contacts than local ones. Lastly, when it comes to 1D structures, profiles classify each residue according to its amino acid type and highly specific backbone conformational state. Once a score has been identified, the alignment is then determined by finding optimal sets of correspondences. Fragment assembly algorithms generate nonsequential alignments while consistent scoring has been utilized in the generation of multiple alignments from vast databases of pairwise alignments. New structures have been placed in a Protein Data Bank and there are certain parameters such as size and molecular shape of proteins that are initially considered before insertion into this data bank. Evaluations of structural alignments play a pivotal role, as they first measure the accuracy of the alignments, the ability of the alignment score to differentiate homologous from unrelated proteins in database-wide comparisons, and finally, the 'quality' of alignments.

Pitfalls of protein structural alignment

A few empirically parameterized models of structural evolution have been previously proposed, but most structural aligners are based on "ad hoc" scores of structural similarity. However, it has been proven that some "ad hoc" scores have worked. In terms of evaluations of structural alignments, the problem with reference-independent evaluation is that it is simply a test of the similarity between scoring functions, rather than the observance of actual rotations that promote alignment optimization. It is also crucial to note that not all programs using the same type of score generated similar alignments. This results in the necessity for developers to pay special attention to the robustness of optimization protocols. Sure enough, despite the few pitfalls of the structural alignment process, the respective models' versatility does advance the study of interplay of sequence and structure evolution in the future.

Reference

Curr Opin Struct Biol. 2009 Jun;19(3):341-8. Epub 2009 May 27. Advances and pitfalls of protein structural alignment. Hasegawa H, Holm L.