Structural Biochemistry/Protein Evolution

Protein Evolution

Protein evolution is a key indicator of the progression of proteins through time. These studies have lead scientists to determine the relationships of proteins between species which share similar functions. Similarly proteins which are homologous adapt to perform different functions. Evolution has forced proteins to become more complex and thus lead scientists to question the origins of simpler proteins which preceded modern proteins.

Protein evolution is not an independent process but part of an entire organism. Changes to the proteins are often only occurring on the sequence level leaving the structures and functions rather conserved. This can be used to explain the presence of homology between proteins which share similar structures but have adapted to perform different functions.

Study of Protein Evolution

The two main approaches to the study of protein evolution are the analysis and comparison of sequences of proteins to prove or disprove evolutionary relationships and the other is the simulation of the evolutionary processes computationally in in vitro studies.

Stages of Protein Evolution

One scenario that was suggested are that the earliest proteins were very small polypeptides with about 10 amino acids and specified by small primitive genes made of RNA. The presence of RNA as the genetic material predated the presence of DNA as the genetic material. The genes coding the proteins probably join together in random sequences while a primitive splicing mechanism pieces together the proteins. Each of the proteins formed would consist of a domain with the characteristic length of amino acids being 100. Further concatenation would lead to multi-domain proteins and thus more complex proteins.

Another scenario which proteins might have evolved starts with small peptides consisting of less than 10 amino acids. These short peptides is said to then form closed loops which consist of 25-30aa and into folds 100-150aa which would lead to multifold proteins. Functionally, the short peptides of less than 10 amino acids do perform any functions, however, the closed loop proteins are functional.

Protein Domains

Domains are typically 100-150 a.a. in length. This characteristic lengths are usually present in all proteins. The fold sizes are believed to have appeared during the early stages of the development of DNA genes. DNA being the successor of genetic material from RNA in many organisms was primitive and believed to have existed in circular forms. The optimal size of DNA ring closure is believed to be about 400 base pairs which is determined by DNA's flexibility. The 400 base pairs can be lead to code to approximately 100-150 amino acids seen in a domain. The upper limit of the circularization of DNA thus has a direct impact on the upper limit of a domain of a protein, the two being interconnected. Most proteins can be said to have evolved from these ancient closed loop units.

PDZ Domain

The PDZ domain is a common domain located in signaling proteins in structures of bacteria, plants, and animals. They are widespread in eukaryotes and eubacteria. Being approximately 90 residues long, they contain critical regions of sequence homology in diverse signaling proteins. Generally, PDZ domains attaches to a small region of the C-terminus of the next consecutive protein. Particularly, these small regions bind to the PDZ domain via beta sheet augmentation. Implicitly, this signifies that the PDZ domain is expanded through the addition of a beta strand from one terminal of the binding partner. PDZ domains are usually located in the combination with other interaction modules and play a role that is directly specified with receptor tyrosine kinase-mediated signaling. It is also involved with other cellular functions such as protein trafficking, synaptic signal coordination, and cell polarity initiation.

SH3 Domain

The SRC Homology 3 (SH3) Domain is a relatively small protein that consists of 60 amino acids. The SH3 domain has the tendency to regulate the state activities of adaptor proteins and tyrosine kinases. They also function as a stimulant for substrate specificity of tyrosine kinases that bind at a large distance from the active site. The SH3 domain is structured in a beta-barrel fold, which is made up of 5-6 beta strands organized in tightly packed anti-parallel beta sheets. The structure of the SH3 domain is a classical fold that is common in eukaryotes and prokaryotes.

WD40 Domain (WD- Repeat)

WD40 domain is one of the most abundant domains and is one of the most active domains of the eukaryotes. Their functions are deeply involved in cellular processes by playing a crucial role as hubs in cellular networks. WD40 Domain regulates diverse protein-protein interactions, especially those that scaffold. They are present in processes such as signal transduction, cell division, chemotaxis, RNA processing, and cytoskeleton construction. WD40 domains were first discovered in bovine beta-transducin, a subunit of the trimeric G protein transducin complex. It contains a series of sequence of approximately 44-60 residues with folds into seven-bladed beta-propellers. Each blade is designed in a four-stranded anti-parallel beta-sheet. WD40 is naturally exploited as seemingly more suitable than other domain candidates because it structurally more compelling. This means that WD40 domains form structures that are highly symmetrical in comparison to other domains that are involved in intracellular processes. The symmetry is of high importance when proteins that lack sequence need to adopt. Additionally, symmetrical folds provide rapid and convenient folding especially for folds that are comprised of discrete and local, non-interlocking units of secondary structures. Unfortunately, WD40 domains have proven to give difficult management. This is mainly because they are usually subunits of a larger assembly. Moreover, they lack the ability to measure intrinsic activity like catalysis. Regardless, WD40 domains act as scaffolds and clearly characterize one of the most significant domain families for cellular processing.