Structural Biochemistry/Bioinformatics/Structural Alignments/Programs Used For Structural Alignment

Programs Used for Structural Alignment

Structural alignment of thioredoxins from humans and Drosophilia Melanogaster. Human protein is in red, fly protein in yellow.

Although somewhat complex, the programs and methods used in structural alignment are quite interesting and hold a wealth of information to be learned. Most of the programs involve matrices and seemingly complex mathematical procedures. Although this holds the difficulty of complex mathematics, it is interesting to see how the structural alignment of proteins is determined and also what each method specifically finds out.

The goal of structural alignment techniques has been to compare individual structures, sets of structures, or an "all-to-all" comparison database that measures the divergence between every pair of structures present in the Protein Data Bank (PDB). The worldwide protein data bank can be found here at this website. These databases generally classify proteins based on their folding.

Certain methods differ in the number of points that are given to each correct protein alignment and the number of points deducted from each incorrect protein alignment. For instance, Glutamine and Asparagine are both polar and have a very similar hydropathy index, so, if Glutamine is present where an Asparagine should be, less points would be deducted than if, say, Valine were there. This type of method allows the maximum number of points to be granted for alignments which change the structure or function of the protein the least. The points granted for each alignment can be compared to other alignments in order to better understand how closely related certain proteins are to one another in structure or function.

VMD

A way to actually compare two structures is to use VMD (http://www.ks.uiuc.edu/Research/vmd/). VMD stands for Virtual Molecular Dynamics. One can load a pdb file using VMD and go to file->add a structure. VMD uses RMSD structural alignment to compare two structures. RMSD stands for Root Mean Square Distance and it compares the distances of the atoms. Lower the RMSD values of two proteins, the more aligned.

DALI

DALI involves the breaking of the input protein structure into hexapeptide fragments which are then inputted in a distance matrix that evaluates contact patterns between successive fragments. The DALI method has been used to determine structural neighbors and fold classification.

Combinatorial Extension

Combinatorial Extension (CE) breaks each protein into a series of fragments and attempts to reassemble them into a complete alignment. This method can be used for structural superpositions, inter-residue distances, secondary structure, solvent exposure, hydrogen-bonding patterns, and dihedral angles.

SSAP

SSAP uses double dynamic programming to produce a structural alignment based on atom-to-atom vectors in structure space. In the first step, SSAP will perform inter-residue distance vectors and adjacent non-related neighboring proteins. The dynamic programming on each matrix produce local alignments that are then recorded onto a summary matrix to determine the overall structural alignment. A SSAP score ranging 80-100 explain highly similar structures whereas scores falling between 70-80 are slightly similar with few deviations. Scores 60-70 may contain the same tertiary structure, but the class may vary.

Beiber*

The *Beiber* method is a combinatorial algorithm for non-sequential structural alignment of proteins and similarity search in databases. This method focuses on secondary structure to evaluate similarities between two different protein structures based on contact maps.

MAMMOTH

MAMMOTH's purpose was originally developed for comparing models coming from structure prediction, but now also works well with experimental models. MAMMOTH has been used to create a large database covering predicted structures of unknown proteins for 150 genomes which allows for genomic scale normalization.

RAPIDO

RAPIDO is a web based program for analyzing three dimensional crystal structures of different protein molecules in the presence of conformational changes. This method involves the calculation of difference distance matrices between fragments that are structurally similar in two different proteins.

SABERTOOTH

SABERTOOTH uses structural profiles to perform structural alignments. This tool recognizes structural similarities with accuracy and quality compared to that of other established alignment tools based on coordinates.

BLOSUM

BLOSUM stands for Blocks of Amino Acid Substitution Matrix uses an assigned score based on the observed frequencies of such occurrences in alignment related proteins. Certain scores may be added with values of either positive or negative value. This scale is then run by a log odd ratio. In essence, two matrices are compared and evaluated by the ratio of similar or identical sequences to the ratio of unknowns missed by one.

TOPOFIT

TOPOFIT analyzes protein structures based on three dimensional Delaunay triangulation patterns derived from backbone representation. TOPOFIT produces a structural alignment of proteins based on the fact that proteins have a common spatial invariant part (a set of tetrahedrons) which is mathematically described as a common spatial sub-graph volume of three dimensional contact graph derived from Delaunay tessellation (DT).

InsightII

insightII is a molecular modeling package developed by Biosym. The programs included include Insight II, BioPolymer, Analysis, and Discover. InsightII therefore is a comprehensive program that can not only build any class of molecule or molecule system, but with the molecular mechanics program Discover, can manipulate these same molecules.

Insight II is primarily used for visualization. It creates, modifies, manipulates, displays, and analyzes molecular systems. Insight II essentially provides the core requirements for all software modules. Analysis revolves around mathematical and geometric modeling of molecular properties. Molecular properties are defined interactively, evaluated dynamically, and visualized interactively through spreadsheets, graphs, and graphic representations. BioPolymer constructs models of polymers—peptides, proteins, carbohydrates, and nucleic acids—for visualing complex structures and use in further simulation work. CHARMM is a simulation program available within insight II that uses energy functions to describe the forces on atoms in molecules. This allows for calculation of interaction and conformational energies, free energy, and vibrational frequencies.

Through use of the Discover program, one can optimize the structure of the molecule or protein being viewed. This is due to the fact that it incorporates a range of well validated forcefields for dynamics simulations, minimization, and conformational searches. This allows for the ability to extrapolate the structure, energetics, and properties of systems, be they organic, inorganic, organometallic, or biological. Because of this program, it is possible to take the sequence of a protein and from that extrapolate a rudimentary structure from it. Discover also implements Inter Process Communications which allow for Discover turn over control to external programs and retrieve those results, inforperating them into continuing Discover computations.