# An Introduction to Molecular Biology/Function and structure of Proteins

Proteins were first described by the Dutch chemist Gerhardus Johannes Mulder and named by the Swedish chemist Jöns Jakob Berzelius in 1838. Early nutritional scientists such as the German Carl von Voit believed that protein was the most important nutrient for maintaining the structure of the body, because it was generally believed that "flesh makes flesh."

The amino acids in a polypeptide chain are linked by peptide bonds. Once linked in the protein chain, an individual amino acid is called a residue, and the linked series of carbon, nitrogen, and oxygen atoms are known as the main chain or protein backbone. The peptide bond has two resonance forms that contribute some double-bond character and inhibit rotation around its axis, so that the alpha carbons are roughly coplanar. The other two dihedral angles in the peptide bond determine the local shape assumed by the protein backbone. The end of the protein with a free carboxyl group is known as the C-terminus or carboxy terminus, whereas the end with a free amino group is known as the N-terminus or amino terminus.

The words protein, polypeptide, and peptide are a little ambiguous and can overlap in meaning. Protein is generally used to refer to the complete biological molecule in a stable conformation, whereas peptide is generally reserved for a short amino acid oligomers often lacking a stable three-dimensional structure. However, the boundary between the two is not well defined and usually lies near 20–30 residues. Polypeptide can refer to any single linear chain of amino acids, usually regardless of length, but often implies an absence of a defined Arginineconformation[1].

The crystal structure of bovine cytochrome c oxidase in a phospholipid bilayer. The intermembrane space lies to top of the image. PDB 1OCC

## Amino acidsEdit

CO-R-N rule

There are 22 standard amino acids, but only 21 are found in eukaryotes. Of the 22, 20 are directly encoded by the universal genetic code. Humans can synthesize 11 of these 20 from each other or from other molecules of intermediary metabolism. The other 9 must be consumed in the diet, and so are called essential amino acids; those are histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine. The remaining two, selenocysteine and pyrrolysine, are incorporated into proteins by unique synthetic mechanisms.

Each α-amino acid consists of a backbone part that is present in all the amino acid types, and a side chain that is unique to each type of residue. An exception from this rule is proline, where the hydrogen atom is replaced by a bond to the side chain. Because the carbon atom is bound to four different groups it is chiral, however only one of the isomers occur in biological proteins. Glycine however, is not chiral since its side chain is a hydrogen atom. A simple mnemonic for correct L-form is "CORN": when the Cα atom is viewed with the H in front, the residues read "CO-R-N" in a clockwise direction[2].

Isomerism

The standard α-amino acids, all but glycine can exist in either of two optical isomers, called L or D amino acids, which are mirror images of each other . While L-amino acids represent all of the amino acids found in proteins during translation in the ribosome, D-amino acids are found in some proteins produced by enzyme posttranslational modifications after translation and translocation to the endoplasmic reticulum, as in exotic sea-dwelling organisms such as cone snails. They are also abundant components of the peptidoglycan cell walls of bacteria, and D-serine may act as a neurotransmitter in the brain. The L and D convention for amino acid configuration refers not to the optical activity of the amino acid itself, but rather to the optical activity of the isomer of glyceraldehyde from which that amino acid can theoretically be synthesized (D-glyceraldehyde is dextrorotary; L-glyceraldehyde is levorotary). Alternatively, the (S) and (R) designators are used to indicate the absolute stereochemistry. Almost all of the amino acids in proteins are (S) at the α carbon, with cysteine being (R) and glycine non-chiral.Cysteine is unusual since it has a sulfur atom at the second position in its side-chain, which has a larger atomic mass than the groups attached to the first carbon which is attached to the α-carbon in the other standard amino acids, thus the (R) instead of (S)[3].

Zwitterions

The amine and carboxylic acid functional groups found in amino acids allow it to have amphiprotic properties. At a certain pH, known as the isoelectric point, an amino acid has no overall charge since the number of protonated ammonia groups (positive charges) and deprotonated carboxylate groups (negative charges) are equal. The amino acids all have different isoelectric points. The ions produced at the isoelectric point have both positive and negative charges and are known as a zwitterion, which comes from the German word Zwitter meaning "hermaphrodite" or "hybrid". Amino acids can exist as zwitterions in solids and in polar solutions such as water, but not in the gas phase. Zwitterions have minimal solubility at their isolectric point and an amino acid can be isolated by precipitating it from water by adjusting the pH to its particular isoelectric point[4].

The 20 naturally occurring amino acids have different physical and chemical properties, including their electrostatic charge, pKa, hydrophobicity, size and specific functional groups. These properties play a major role in molding protein structure. The salient features of amino acids are described below in the table.

Amino Acid Abbrev. Remarks
Alanine
A Ala Very abundant, very versatile. More stiff than glycine, but small enough to pose only small steric limits for the protein conformation. It behaves fairly neutrally, and can be located in both hydrophilic regions on the protein outside and the hydrophobic areas inside.
Asparagine or aspartic acid B Asx A placeholder when either amino acid may occupy a position.
Cysteine
C Cys The sulfur atom bonds readily to heavy metal ions. Under oxidizing conditions, two cysteines can join together in a disulfide bond to form the amino acid cystine. When cystines are part of a protein, insulin for example, the tertiary structure is stabilized, which makes the protein more resistant to denaturation; therefore, disulfide bonds are common in proteins that have to function in harsh environments including digestive enzymes (e.g., pepsin and chymotrypsin) and structural proteins (e.g., keratin). Disulfides are also found in peptides too small to hold a stable shape on their own (eg. insulin).
Aspartic acid
D Asp Behaves similarly to glutamic acid. Carries a hydrophilic acidic group with strong negative charge. Usually is located on the outer surface of the protein, making it water-soluble. Binds to positively-charged molecules and ions, often used in enzymes to fix the metal ion. When located inside of the protein, aspartate and glutamate are usually paired with arginine and lysine.
Glutamic acid
E Glu Behaves similar to aspartic acid. Has longer, slightly more flexible side chain.
Phenylalanine
F Phe Essential for humans. Phenylalanine, tyrosine, and tryptophan contain large rigid aromatic group on the side-chain. These are the biggest amino acids. Like isoleucine, leucine and valine, these are hydrophobic and tend to orient towards the interior of the folded protein molecule. Phenylalanine can be converted into Tyrosine.
Glycine
G Gly Because of the two hydrogen atoms at the α carbon, glycine is not optically active. It is the smallest amino acid, rotates easily, adds flexibility to the protein chain. It is able to fit into the tightest spaces, e.g., the triple helix of collagen. As too much flexibility is usually not desired, as a structural component it is less common than alanine.
Histidine
H His In even slightly acidic conditions protonation of the nitrogen occurs, changing the properties of histidine and the polypeptide as a whole. It is used by many proteins as a regulatory mechanism, changing the conformation and behavior of the polypeptide in acidic regions such as the late endosome or lysosome, enforcing conformation change in enzymes. However only a few histidines are needed for this, so it is comparatively scarce.
Isoleucine
I Ile Essential for humans. Isoleucine, leucine and valine have large aliphatic hydrophobic side chains. Their molecules are rigid, and their mutual hydrophobic interactions are important for the correct folding of proteins, as these chains tend to be located inside of the protein molecule.
Leucine or isoleucine J Xle A placeholder when either amino acid may occupy a position
Lysine
K Lys Essential for humans. Behaves similarly to arginine. Contains a long flexible side-chain with a positively-charged end. The flexibility of the chain makes lysine and arginine suitable for binding to molecules with many negative charges on their surfaces. E.g., DNA-binding proteins have their active regions rich with arginine and lysine. The strong charge makes these two amino acids prone to be located on the outer hydrophilic surfaces of the proteins; when they are found inside, they are usually paired with a corresponding negatively-charged amino acid, e.g., aspartate or glutamate.
Leucine
L Leu Essential for humans. Behaves similar to isoleucine and valine. See isoleucine.
Methionine
M Met Essential for humans. Always the first amino acid to be incorporated into a protein; sometimes removed after translation. Like cysteine, contains sulfur, but with a methyl group instead of hydrogen. This methyl group can be activated, and is used in many reactions where a new carbon atom is being added to another molecule.
Asparagine
N Asn Similar to aspartic acid. Asn contains an amide group where Asp has a carboxyl.
Pyrrolysine O Pyl Similar to lysine, with a pyrroline ring attached.
Proline
P Pro Contains an unusual ring to the N-end amine group, which forces the CO-NH amide sequence into a fixed conformation. Can disrupt protein folding structures like α helix or β sheet, forcing the desired kink in the protein chain. Common in collagen, where it often undergoes a posttranslational modification to hydroxyproline.
Glutamine
Q Gln Similar to glutamic acid. Gln contains an amide group where Glu has a carboxyl. Used in proteins and as a storage for ammonia. The most abundant Amino Acid in the body.
Arginine
R Arg Functionally similar to lysine.
Serine
S Ser Serine and threonine have a short group ended with a hydroxyl group. Its hydrogen is easy to remove, so serine and threonine often act as hydrogen donors in enzymes. Both are very hydrophilic, therefore the outer regions of soluble proteins tend to be rich with them.
Threonine
T Thr Essential for humans. Behaves similarly to serine.
Selenocysteine U Sec Selenated form of cysteine, which replaces sulfur.
Valine
V Val Essential for humans. Behaves similarly to isoleucine and leucine. See isoleucine.
Tryptophan
W Trp Essential for humans. Behaves similarly to phenylalanine and tyrosine (see phenylalanine). Precursor of serotonin. Naturally fluorescent.
Unknown X Xaa Placeholder when the amino acid is unknown or unimportant.
Tyrosine
Y Tyr Behaves similarly to phenylalanine (precursor to Tyrosine) and tryptophan (see phenylalanine). Precursor of melanin, epinephrine, and thyroid hormones. Naturally fluorescent, although fluorescence is usually quenched by energy transfer to tryptophans.
Glutamic acid or glutamine Z Glx A placeholder when either amino acid may occupy a position.

### Classification of aminoacidsEdit

The 20 amino acids encoded directly by the genetic code can be divided into several groups based on their properties. Important factors are charge, hydrophilicity or hydrophobicity, size and functional groups.Amino acids are usually classified by the properties of their side chain into four groups. The side chain can make an amino acid a weak acid or a weak base, and a hydrophile if the side chain is polar or a hydrophobe if it is nonpolar.

An α-amino acid. The CαH atom is omitted in the diagram.

Protein amino acids are combined into a single polypeptide chain in a condensation reaction. This reaction is catalysed by the ribosome in a process known as translation.

Essential Nonessential
Isoleucine Alanine
Leucine Asparagine
Lysine Aspartic Acid
Methionine Cysteine*
Phenylalanine Glutamic Acid
Threonine Glutamine*
Tryptophan Glycine*
Valine Proline*
Selenocysteine*
Serine*
Tyrosine*
Arginine*
Histidine*
Ornithine*
Taurine*

Polar and non polar amino acids and their single and three letter code

Amino Acid Three Letter code Single Letter code Side chain polarity Side chain charge (pH 7.4) Hydropathy index Absorbance λmax(nm) ε at λmax (x10−3 M−1 cm−1)
Alanine Ala A nonpolar neutral 1.8
Arginine Arg R polar positive −4.5
Asparagine Asn N polar neutral −3.5
Aspartic acid Asp D polar negative −3.5
Cysteine Cys C nonpolar neutral 2.5 250 0.3
Glutamic acid Glu E polar negative −3.5
Glutamine Gln Q polar neutral −3.5
Glycine Gly G nonpolar neutral −0.4
Histidine His H polar positive(10%)

neutral(90%)

−3.2 211 5.9
Isoleucine Ile I nonpolar neutral 4.5
Leucine Leu L nonpolar neutral 3.8
Lysine Lys K polar positive −3.9
Methionine Met M nonpolar neutral 1.9
Phenylalanine Phe F nonpolar neutral 2.8 257, 206, 188 0.2, 9.3, 60.0
Proline Pro P nonpolar neutral −1.6
Serine Ser S polar neutral −0.8
Threonine Thr T polar neutral −0.7
Tryptophan Trp W nonpolar neutral −0.9 280, 219 5.6, 47.0
Tyrosine Tyr Y polar neutral −1.3 274, 222, 193 1.4, 8.0, 48.0
Valine Val V nonpolar neutral 4.2

Additionally, there are two additional amino acids which are incorporated by overriding stop codons:

21st and 22nd amino acids 3-Letter 1-Letter
Selenocysteine Sec U
Pyrrolysine Pyl O

In addition to the specific amino acid codes, placeholders are used in cases where chemical or crystallographic analysis of a peptide or protein can not conclusively determine the identity of a residue.

Ambiguous Amino Acids 3-Letter 1-Letter
Asparagine or aspartic acid Asx B
Glutamine or glutamic acid Glx Z
Leucine or Isoleucine Xle J
Unspecified or unknown amino acid Xaa X

Unk is sometimes used instead of Xaa, but is less standard.

Additionally, many non-standard amino acids have a specific code. For example, several peptide drugs, such as Bortezomib or MG132 are artificially synthesized and retain their protecting groups, which have specific codes. Bortezomib is Pyz-Phe-boroLeu and MG132 is Z-Leu-Leu-Leu-al. Additionally, To aid in the analysis of protein structure, photocrosslinking amino acid analogues are available. These include photoleucine (pLeu) and photomethionine (pMet).[5]

## Peptide bondEdit

The condensation of two amino acids to form a peptide bond

A peptide bond (amide bond) is a covalent chemical bond formed between two molecules when the carboxyl group of one molecule reacts with the amino group of the other molecule, thereby releasing a molecule of water (H2O). This is a dehydration synthesis reaction (also known as a condensation reaction), and usually occurs between amino acids. The resulting C(O)NH bond is called a peptide bond, and the resulting molecule is an amide. The four-atom functional group -C(=O)NH- is called a peptide link. Polypeptides and proteins are chains of amino acids held together by peptide bonds, as is the backbone of PNA.

A peptide bond can be broken by amide hydrolysis (the adding of water). The peptide bonds in proteins are metastable, meaning that in the presence of water they will break spontaneously, releasing 2-4 kcal/mol of free energy, but this process is extremely slow. In living organisms, the process is facilitated by enzymes. Living organisms also employ enzymes to form peptide bonds; this process requires free energy. The wavelength of absorbance for a peptide bond is 190-230 nm.

The peptide bond tend to be planar due to the delocalization of the electrons from the double bond. The rigid peptide dihedral angle, ω (the bond between C1 and N) is always close to 180 degrees. The dihedral angles phi φ (the bond between N and Cα) and psi ψ (the bond between Cα and C1) can have a certain range of possible values. These angles are the internal degrees of freedom of a protein, they control the protein's conformation. They are restrained by geometry to allowed ranges typical for particular secondary structure elements, and represented in a Ramachandran plot. A few important bond lengths are given in the table below[6].

 Peptide bond Average length Single bond Average length Hydrogen bond Average (±30) Ca - C 153 pm C - C 154 pm O-H --- O-H 280 pm C - N 133 pm C - N 148 pm N-H --- O=C 290 pm N - Ca 146 pm C - O 143 pm O-H --- O=C 280 pm

### β-peptidesEdit

In α amino acids (molecule at left), both the carboxylic acid group (red) and the amino group (blue) are bonded to the same carbon center, termed the α carbon (${\displaystyle \mathrm {C} ^{\alpha }}$) because it is one atom away from the carboxylate group. In β amino acids, the amino group is bonded to the β carbon (${\displaystyle \mathrm {C} ^{\beta }}$), which is found in most of the 20 standard amino acids. Only Glycine lacks a β carbon, which means that β-glycine is not possible.

The chemical synthesis of β amino acids can be challenging, especially given the diversity of functional groups bonded to the β carbon and the necessity of maintaining chirality. In the alanine molecule shown, the β carbon is achiral; however, most larger amino acids have a chiral ${\displaystyle \mathrm {C} ^{\beta }}$ atom. A number of synthesis mechanisms have been introduced to efficiently form β amino acids and their derivatives[7][8] notably those based on the Arndt-Eistert synthesis.

Two main types of β-peptides exist: those with the organic residue (R) next to the amine are called β3-peptides and those with position next to the carbonyl group are called β2-peptides.[9]

## EnzymesEdit

Enzymes are generally globular proteins and range from just 62 amino acid residues in size, for the monomer of 4-oxalocrotonate tautomerase, to over 2,500 residues in the animal fatty acid synthase. A small number of RNA-based biological catalysts exist, with the most common being the ribosome; these are referred to as either RNA-enzymes or ribozymes. The activities of enzymes are determined by their three-dimensional structure. However, although structure does determine function, predicting a novel enzyme's activity just from its structure is a very difficult problem that has not yet been solved.

Most enzymes are much larger than the substrates they act on, and only a small portion of the enzyme (around 3–4 amino acids) is directly involved in catalysis. The region that contains these catalytic residues, binds the substrate, and then carries out the reaction is known as the active site. Enzymes can also contain sites that bind cofactors, which are needed for catalysis. Some enzymes also have binding sites for small molecules, which are often direct or indirect products or substrates of the reaction catalyzed. This binding can serve to increase or decrease the enzyme's activity, providing a means for feedback regulation. Like all proteins, enzymes are long, linear chains of amino acids that fold to produce a three-dimensional product. Each unique amino acid sequence produces a specific structure, which has unique properties. Individual protein chains may sometimes group together to form a protein complex. Most enzymes can be denatured—that is, unfolded and inactivated—by heating or chemical denaturants, which disrupt the three-dimensional structure of the protein. Depending on the enzyme, denaturation may be reversible or irreversible. Structures of enzymes in complex with substrates or substrate analogs during a reaction may be obtained using Time resolved crystallography methods[10].

### Classification of enzymesEdit

An enzyme's name is often derived from its substrate or the chemical reaction it catalyzes, with the word ending in -ase. Examples are lactase, alcohol dehydrogenase and DNA polymerase. This may result in different enzymes, called isozymes, with the same function having the same basic name. Isoenzymes have a different amino acid sequence and might be distinguished by their optimal pH, kinetic properties or immunologically. Isoenzyme and isozyme are homologous proteins. Furthermore, the normal physiological reaction an enzyme catalyzes may not be the same as under artificial conditions. This can result in the same enzyme being identified with two different names. E.g. Glucose isomerase, used industrially to convert glucose into the sweetener fructose, is a xylose isomerase in vivo.

The International Union of Biochemistry and Molecular Biology have developed a nomenclature for enzymes, the EC numbers. The Enzyme Commission number (EC number) is a numerical classification scheme for enzymes, based on the chemical reactions they catalyze. As a system of enzyme nomenclature, every EC number is associated with a recommended name for the respective enzyme. Each enzyme is described by a sequence of four numbers preceded by "EC". The first number broadly classifies the enzyme based on its mechanism. Strictly speaking, EC numbers do not specify enzymes, but enzyme-catalyzed reactions. If different enzymes (for instance from different organisms) catalyze the same reaction, then they receive the same EC number. By contrast, UniProt identifiers uniquely specify a protein by its amino acid sequence[11].

EC 1 Oxidoreductases: catalyze oxidation/reduction reactions

EC 2 Transferases: transfer a functional group (e.g. a methyl or phosphate group)

EC 3 Hydrolases: catalyze the hydrolysis of various bonds

EC 4 Lyases: cleave various bonds by means other than hydrolysis and oxidation

EC 5 Isomerases: catalyze isomerization changes within a single molecule

EC 6 Ligases: join two molecules with covalent bonds.

Top-level EC numbers[12]
Group Reaction catalyzed Typical reaction Enzyme example(s) with trivial name
EC 1
Oxidoreductases
To catalyze oxidation/reduction reactions; transfer of H and O atoms or electrons from one substance to another AH + B → A + BH (reduced)
A + O → AO (oxidized)
Dehydrogenase, oxidase
EC 2
Transferases
Transfer of a functional group from one substance to another. The group may be methyl-, acyl-, amino- or phosphate group AB + C → A + BC Transaminase, kinase
EC 3
Hydrolases
Formation of two products from a substrate by hydrolysis AB + H2O → AOH + BH Lipase, amylase, peptidase
EC 4
Lyases
Non-hydrolytic addition or removal of groups from substrates. C-C, C-N, C-O or C-S bonds may be cleaved RCOCOOH → RCOH + CO2 or [x-A-B-Y] → [A=B + X-Y] Decarboxylase
EC 5
Isomerases
Intramolecule rearrangement, i.e. isomerization changes within a single molecule AB → BA Isomerase, mutase
EC 6
Ligases
Join together two molecules by synthesis of new C-O, C-S, C-N or C-C bonds with simultaneous breakdown of ATP X + Y+ ATP → XY + ADP + Pi Synthetase

### OxidoreductaseEdit

In molecular biology and biochemistry, an oxidoreductase is an enzyme that catalyzes the transfer of electrons from one molecule (the reductant, also called the hydrogen or electron donor) to another (the oxidant, also called the hydrogen or electron acceptor). This group of enzymes usually utilizes NADP or NAD as cofactors. In general, polypeptides are unbranched polymers, so their primary structure can often be specified by the sequence of amino acids along their backbone. However, proteins can become cross-linked, most commonly by disulfide bonds, and the primary structure also requires specifying the cross-linking atoms, e.g., specifying the cysteines involved in the protein's disulfide bonds. Other crosslinks include desmosine... The chiral centers of a polypeptide chain can undergo racemization. In particular, the L-amino acids normally found in proteins can spontaneously isomerize at the Cα atom to form D-amino acids, which cannot be cleaved by most proteases[13].

## Structure of proteinEdit

A Ramachandran plot generated from the protein PCNA, a human DNA clamp protein that is composed of both beta sheets and alpha helices (PDB ID 1AXC). Points that lie on the axes indicate N- and C-terminal residues for each subunit. The green regions show possible angle formations that include glycine, while the blue areas are for formations that don't include glycine.

### Primary structure of proteinEdit

Ramachandran diagram (φ,ψ plot), with data points for α-helical residues forming a dense diagonal cluster below and left of center, around the global energy minimum for backbone conformation.[14]

The proposal that proteins were linear chains of α-amino acids was made nearly simultaneously by two scientists at the same conference in 1902, the 74th meeting of the Society of German Scientists and Physicians, held in Karlsbad. Franz Hofmeister made the proposal in the morning, based on his observations of the biuret reaction in proteins. Hofmeister was followed a few hours later by Emil Fischer, who had amassed a wealth of chemical details supporting the peptide-bond model. For completeness, the proposal that proteins contained amide linkages was made as early as 1882 by the French chemist E. Grimaux[15].

Despite these data and later evidence that proteolytically digested proteins yielded only oligopeptides, the idea that proteins were linear, unbranched polymers of amino acids was not accepted immediately. Some well-respected scientists such as William Astbury doubted that covalent bonds were strong enough to hold such long molecules together; they feared that thermal agitations would shake such long molecules asunder. Hermann Staudinger faced similar prejudices in the 1920s when he argued that rubber was composed of macromolecules. Thus, several alternative hypotheses arose. The colloidal protein hypothesis stated that proteins were colloidal assemblies of smaller molecules. This hypothesis was disproved in the 1920s by ultracentrifugation measurements by Theodor Svedberg that showed that proteins had a well-defined, reproducible molecular weight and by electrophoretic measurements by Arne Tiselius that indicated that proteins were single molecules.

A second hypothesis, the cyclol hypothesis advanced by Dorothy Wrinch, proposed that the linear polypeptide underwent a chemical cyclol rearrangement C=O + HN C(OH)-N that crosslinked its backbone amide groups, forming a two-dimensional fabric. Other primary structures of proteins were proposed by various researchers, such as the diketopiperazine model of Emil Abderhalden and the pyrrol/piperidine model of Troensegaard in 1942. Although never given much credence, these alternative models were finally disproved when Frederick Sanger successfully sequenced insulin and by the crystallographic determination of myoglobin and hemoglobin by Max Perutz and John Kendrew.

The primary structure of peptides and proteins refers to the linear sequence of its amino acid structural units. The term "primary structure" was first coined by Linderstrøm-Lang in 1951. By convention, the primary structure of a protein is reported starting from the amino-terminal (N) end to the carboxyl-terminal (C) end. The post-translational modifications of protein such as disulfide formation, phosphorylations and glycosylations are usually also considered a part of the primary structure, and cannot be read from the gene.

### Secondary structure of proteinEdit

The Hemoglobin molecule has four heme-binding subunits, each largely made of alpha helices.

Secondary structure refers to highly regular local sub-structures. Two main types of secondary structure, the alpha helix and the beta strand, were suggested in 1951 by Linus Pauling' and coworkers.[16]. These secondary structures are defined by patterns of hydrogen bonds between the main-chain peptide groups. They have a regular geometry, being constrained to specific values of the dihedral angles ψ and φ on the Ramachandran plot. Both the alpha helix and the beta-sheet represent a way of saturating all the hydrogen bond donors and acceptors in the peptide backbone. Some parts of the protein are ordered but do not form any regular structures. They should not be confused with random coil, an unfolded polypeptide chain lacking any fixed three-dimensional structure. Several sequential secondary structures may form a "supersecondary unit".[17]

Beta-meander motif
Portion of outer surface Protein A of Borrelia burgdorferi complexed with a murine monoclonal antibody.

Amino acids vary in their ability to form the various secondary structure elements. Proline and glycine are sometimes known as "helix breakers" because they disrupt the regularity of the α helical backbone conformation; however, both have unusual conformational abilities and are commonly found in turns. Amino acids that prefer to adopt helical conformations in proteins include methionine, alanine, leucine, glutamate and lysine ("MALEK" in amino-acid 1-letter codes); by contrast, the large aromatic residues (tryptophan, tyrosine and phenylalanine) and Cβ-branched amino acids (isoleucine, valine, and threonine) prefer to adopt β-strand conformations. However, these preferences are not strong enough to produce a reliable method of predicting secondary structure from sequence alone.Secondary structure in proteins consists of local inter-residue interactions mediated by hydrogen bonds, or not. The most common secondary structures are alpha helices and beta sheets. Other helices, such as the 310 helix and π helix, are calculated to have energetically favorable hydrogen-bonding patterns but are rarely if ever observed in natural proteins except at the ends of α helices due to unfavorable backbone packing in the center of the helix.

A Ramachandran plot (also known as a Ramachandran map or a Ramachandran diagram or a [φ,ψ] plot), developed by Gopalasamudram Narayana Ramachandran and Viswanathan Sasisekharan is a way to visualize dihedral angles ψ against φ of amino acid residues in protein structure. [18]. It shows the possible conformations of ψ and φ angles for a polypeptide.

Mathematically, the Ramachandran plot is the visualization of a function ${\displaystyle f:\left[-\pi ,\pi \right)\times \left[-\pi ,\pi \right)\rightarrow \mathbb {R_{{}+{}}} }$. The domain of this function is the torus. Hence, the conventional Ramachandran plot is a projection of the torus on the plane, resulting in a distorted view and the presence of discontinuities. One would expect that larger side chains would result in more restrictions and consequently a smaller allowable region in the Ramachandran plot. In practice this does not appear to be the case; only the methylene group at the α position has an influence. Glycine has a hydrogen atom, with a smaller van der Waals radius, instead of a methyl group at the α position. Hence it is least restricted and this is apparent in the Ramachandran plot for glycine for which the allowable area is considerably larger. In contrast, the Ramachandran plot for proline shows only a very limited number of possible combinations of ψ and φ. The Ramachandran plot was calculated just before the first protein structures at atomic resolution were determined. Forty years later there were tens of thousands of high-resolution protein structures determined by X-ray crystallography and deposited in the Protein Data Bank (PDB). From one thousand different protein chains, Ramachandran plots of over 200 000 amino acids were plotted, showing some significant differences, especially for glycine (Hovmöller et al. 2002). The upper left region was found to be split into two; one to the left containing amino acids in beta sheets and one to the right containing the amino acids in random coil of this conformation. One can also plot the dihedral angles in polysaccharides and other polymers in this fashion. For the first two protein side-chain dihedral angles a similar plot is the Janin Plot.

α helix

The amino acids in an α helix are arranged in a right-handed helical structure where each amino acid residue corresponds to a 100° turn in the helix (i.e., the helix has 3.6 residues per turn), and a translation of 1.5 Å (0.15 nm) along the helical axis. (Short pieces of left-handed helix sometimes occur with a large content of achiral glycine amino acids, but are unfavorable for the other normal, biological L-amino acids.) The pitch of the alpha-helix (the vertical distance between one consecutive turn of the helix) is 5.4 Å (0.54 nm) which is the product of 1.5 and 3.6. What is most important is that the N-H group of an amino acid forms a hydrogen bond with the C=O group of the amino acid four residues earlier; this repeated hydrogen bonding is the most prominent characteristic of an α-helix. Official international nomenclature specifies two ways of defining α-helices, rule 6.2 in terms of repeating φ,ψ torsion angles and rule 6.3 in terms of the combined pattern of pitch and hydrogen bonding. Different amino-acid sequences have different propensities for forming α-helical structure. Methionine, alanine, leucine, uncharged glutamate, and lysine ("MALEK" in the amino-acid 1-letter codes) all have especially high helix-forming propensities, whereas proline and glycine have poor helix-forming propensities. Proline either breaks or kinks a helix, both because it cannot donate an amide hydrogen bond (having no amide hydrogen), and also because its sidechain interferes sterically with the backbone of the preceding turn - inside a helix, this forces a bend of about 30° in the helix axis.[9] However, proline is often seen as the first residue of a helix, presumably due to its structural rigidity. At the other extreme, glycine also tends to disrupt helices because its high conformational flexibility makes it entropically expensive to adopt the relatively constrained α-helical structure[19].

Representation of a beta hairpin
Greek-key motif in protein structure.

β sheet The first β sheet structure was proposed by William Astbury in the 1930s. He proposed the idea of hydrogen bonding between the peptide bonds of parallel or antiparallel extended β strands. However, Astbury did not have the necessary data on the bond geometry of the amino acids in order to build accurate models, especially since he did not then know that the peptide bond was planar. A refined version was proposed by Linus Pauling and Robert Corey in 1951.

The β sheet (also β-pleated sheet) is the second form of regular secondary structure in proteins, only somewhat less common than alpha helix. Beta sheets consist of beta strands connected laterally by at least two or three backbone hydrogen bonds, forming a generally twisted, pleated sheet. A beta strand (also β strand) is a stretch of polypeptide chain typically 3 to 10 amino acids long with backbone in an almost fully extended conformation.

A very simple structural motif involving β sheets is the β hairpin, in which two antiparallel strands are linked by a short loop of two to five residues, of which one is frequently a Glycine or a proline, both of which can assume the unusual dihedral-angle conformations required for a tight turn. However, individual strands can also be linked in more elaborate ways with long loops that may contain alpha helices or even entire protein domains.

Greek key motif The Greek key motif consists of four adjacent antiparallel strands and their linking loops. It consists of three antiparallel strands connected by hairpins, while the fourth is adjacent to the first and linked to the third by a longer loop. This type of structure forms easily during the protein folding process.[20][21] It was named after a pattern common to Greek ornamental artwork (see meander (art)).

The β-α-β motif Due to the chirality of their component amino acids, all strands exhibit a "right-handed" twist evident in most higher-order β sheet structures. In particular, the linking loop between two parallel strands almost always has a right-handed crossover chirality, which is strongly favored by the inherent twist of the sheet. This linking loop frequently contains a helical region, in which case it is called a β-α-β motif. A closely related motif called a β-α-β-α motif forms the basic component of the most commonly observed protein tertiary structure, the TIM barrel.

β-meander motif A simple supersecondary protein topology composed of 2 or more consecutive antiparallel β-strands linked together by hairpin loops.[22][23] This motif is common in β-sheets and can be found in several structural architectures including β-barrels and β-propellers.

Psi-loop motif
Portion of Carboxypeptidase A.

Psi-loop motif The psi-loop, Ψ-loop, motif consists of two antiparallel strands with one strand in between that is connected to both by hydrogen bonds.[24] There are four possible strand topologies for single Ψ-loops as cited by Hutchinson et al. (1990). This motif is rare as the process resulting in its formation seems unlikely to occur during protein folding. The Ψ-loop was first identified in the aspartic protease family.[25]

Coiled coils

The possibility of coiled coils for α-keratin was proposed by Francis Crick in 1952 as well as mathematical methods for determining their structure. Remarkably, this was soon after the structure of the alpha helix was suggested in 1951 by Linus Pauling and coworkers.

Coiled coils usually contain a repeated pattern, hxxhcxc, of hydrophobic (h) and charged (c) amino-acid residues, referred to as a heptad repeat. The positions in the heptad repeat are usually labeled abcdefg, where a and d are the hydrophobic positions, often being occupied by isoleucine, leucine or valine. Folding a sequence with this repeating pattern into an alpha-helical secondary structure causes the hydrophobic residues to be presented as a 'stripe' that coils gently around the helix in left-handed fashion, forming an amphipathic structure. The most favorable way for two such helices to arrange themselves in the water-filled environment of the cytoplasm is to wrap the hydrophobic strands against each other sandwiched between the hydrophilic amino acids. It is thus the burial of hydrophobic surfaces, that provides the thermodynamic driving force for the oligomerization. The packing in a coiled-coil interface is exceptionally tight, with almost complete van der Waals contact between the side chains of the a and d residues. This tight packing was originally predicted by Francis Crick in 1952 and is referred to as Knobs into holes packing. The α-helices may be parallel or anti-parallel, and usually adopt a left-handed super-coil. Although disfavored, a few right-handed coiled coils have also been observed in nature and in designed proteins[26].

Structural features of the three major forms of protein helices[27]
Geometry attribute α-helix 310 helix π-helix
Residues per turn 3.6 3.0 4.4
Translation per residue 1.5Å 2.0Å 1.1Å
Radius of helix 2.3Å 1.9Å 2.8Å
Pitch 5.4Å 6.0Å 4.8Å
The four levels of protein structure, from top to bottom: primary structure, secondary structure (β-sheet left, right α-helix), tertiary and quaternary structure.

### Tertiary structure of proteinEdit

Tertiary structure is considered to be largely determined by the protein's primary structure - the sequence of amino acids of which it is composed. Efforts to predict tertiary structure from the primary structure are known generally as protein structure prediction. However, the environment in which a protein is synthesized and allowed to fold are significant determinants of its final shape and are usually not directly taken into account by current prediction methods.

In globular proteins, tertiary interactions are frequently stabilized by the sequestration of hydrophobic amino acid residues in the protein core, from which water is excluded, and by the consequent enrichment of charged or hydrophilic residues on the protein's water-exposed surface. In secreted proteins that do not spend time in the cytoplasm, disulfide bonds between cysteine residues help to maintain the protein's tertiary structure. A variety of common and stable tertiary structures appear in a large number of proteins that are unrelated in both function and evolution - for example, many proteins are shaped like a TIM barrel, named for the enzyme triosephosphateisomerase. Another common structure is a highly stable dimeric coiled coil structure composed of 2-7 alpha helices.

The majority of protein structures known to date have been solved with the experimental technique of X-ray crystallography, which typically provides data of high resolution but provides no time-dependent information on the protein's conformational flexibility. A second common way of solving protein structures uses NMR, which provides somewhat lower-resolution data in general and is limited to relatively small proteins, but can provide time-dependent information about the motion of a protein in solution. Dual polarisation interferometry is a time resolved analytical method for determining the overall conformation and conformational changes in surface captured proteins providing complementary information to these high resolution methods. More is known about the tertiary structural features of soluble globular proteins than about membrane proteins because the latter class is extremely difficult to study using these methods[28].

### Quaternary structure of proteinsEdit

Several proteins are actually assemblies of more than one polypeptide chain, which in the context of the larger assemblage are known as protein subunits. In addition to the tertiary structure of the subunits, multiple-subunit proteins possess a quaternary structure, which is the arrangement into which the subunits assemble. Enzymes composed of subunits with diverse functions are sometimes called holoenzymes, in which some parts may be known as regulatory subunits and the functional core is known as the catalytic subunit. Examples of proteins with quaternary structure include hemoglobin, DNA polymerase, and ion channels. Other assemblies referred to instead as multiprotein complexes also possess quaternary structure. Examples include nucleosomes and microtubules.

Changes in quaternary structure can occur through conformational changes within individual subunits or through reorientation of the subunits relative to each other. It is through such changes, which underlie cooperativity and allostery in "multimeric" enzymes, that many proteins undergo regulation and perform their physiological function. The above definition follows a classical approach to biochemistry, established at times when the distinction between a protein and a functional, proteinaceous unit was difficult to elucidate. More recently, people refer to protein-protein interaction when discussing quaternary structure of proteins and consider all assemblies of proteins as protein complexes[29].

## Protein structure determinationEdit

Ribbon diagram of the structure of myoglobin, showing colored alpha helices. Such proteins are long, linear molecules with thousands of atoms; yet the relative position of each atom has been determined with sub-atomic resolution by X-ray crystallography. Since it is difficult to visualize all the atoms at once, the ribbon shows the rough path of the protein polymer from its N-terminus (blue) to its C-terminus (red).

Around 90% of the protein structures available in the Protein Data Bank have been determined by X-ray crystallography. This method allows one to measure the 3D density distribution of electrons in the protein (in the crystallized state) and thereby infer the 3D coordinates of all the atoms to be determined to a certain resolution. Roughly 9% of the known protein structures have been obtained by Nuclear Magnetic Resonance techniques. The secondary structure composition can be determined via circular dichroism or dual polarisation interferometry. Cryo-electron microscopy has recently become a means of determining protein structures to high resolution (less than 5 angstroms or 0.5 nanometer) and is anticipated to increase in power as a tool for high resolution work in the next decade. This technique is still a valuable resource for researchers working with very large protein complexes such as virus coat proteins and amyloid fibers.

### X-ray crystallographyEdit

X-ray crystallography of biological molecules took off with Dorothy Crowfoot Hodgkin, who solved the structures of cholesterol (1937), vitamin B12 (1945) and penicillin (1954), for which she was awarded the Nobel Prize in Chemistry in 1964. In 1969, she succeeded in solving the structure of insulin, on which she worked for over thirty years.[30]

X-ray crystallography is a method of determining the arrangement of atoms within a crystal, in which a beam of X-rays strikes a crystal and diffracts into many specific directions. Crystal structures of proteins (which are irregular and hundreds of times larger than cholesterol) began to be solved in the late 1950s, beginning with the structure of sperm whale myoglobin by Max Perutz and Sir John Cowdery Kendrew, for which they were awarded the Nobel Prize in Chemistry in 1962.[31] Since that success, over 61840 X-ray crystal structures of proteins, nucleic acids and other biological molecules have been determined.[32] For comparison, the nearest competing method in terms of structures analyzed is nuclear magnetic resonance (NMR) spectroscopy, which has resolved 8759 chemical structures.[33] Moreover, crystallography can solve structures of arbitrarily large molecules, whereas solution-state NMR is restricted to relatively small ones (less than 70 kDa). X-ray crystallography is now used routinely by scientists to determine how a pharmaceutical drug interacts with its protein target and what changes might improve it.[34] However, intrinsic membrane proteins remain challenging to crystallize because they require detergents or other means to solubilize them in isolation, and such detergents often interfere with crystallization. Such membrane proteins are a large component of the genome and include many proteins of great physiological importance, such as ion channels and receptors.[35][36]

### Nuclear magnetic resonance spectroscopy or NMREdit

Protein nuclear magnetic resonance spectroscopy (usually abbreviated protein NMR) is a field of structural biology in which NMR spectroscopy is used to obtain information about the structure and dynamics of proteins. The field was pioneered by Richard R. Ernst and Kurt Wüthrich[1], among others. Protein NMR techniques are continually being used and improved in both academia and the biotech industry. Structure determination by NMR spectroscopy usually consists of several following phases, each using a separate set of highly specialized techniques. The sample is prepared, resonances are assigned, restraints are generated and a structure is calculated and validated

## How to sequence a protein?Edit

Protein sequencing is a technique to determine the amino acid sequence of a protein, as well as which conformation the protein adopts and the extent to which it is complexed with any non-peptide molecules. Discovering the structures and functions of proteins in living organisms is an important tool for understanding cellular processes, and allows drugs that target specific metabolic pathways to be invented more easily. The two major direct methods of protein sequencing are mass spectrometry and the Edman degradation reaction. It is also possible to generate an amino acid sequence from the DNA or mRNA sequence encoding the protein, if this is known. However, there are a number of other reactions which can be used to gain more limited information about protein sequences and can be used as preliminaries to the aforementioned methods of sequencing or to overcome specific inadequacies within them[37].

The Edman degradation is a very important reaction for protein sequencing, because it allows the ordered amino acid composition of a protein to be discovered. Automated Edman sequencers are now in widespread use, and are able to sequence peptides up to approximately 50 amino acids long. A reaction scheme for sequencing a protein by the Edman degradation follows - some of the steps are elaborated on subsequently. Break any disulfide bridges in the protein with an oxidising agent like performic acid or reducing agent like 2-mercaptoethanol. A protecting group such as iodoacetic acid may be necessary to prevent the bonds from re-forming. Separate and purify the individual chains of the protein complex, if there are more than one.

Determine the amino acid composition of each chain.

Determine the terminal amino acids of each chain.

Break each chain into fragments under 50 amino acids long.

Separate and purify the fragments.

Determine the sequence of each fragment.

Repeat with a different pattern of cleavage.

Construct the sequence of the overall protein.

Digestion into peptide fragments Peptides longer than about 50-70 amino acids long cannot be sequenced reliably by the Edman degradation. Because of this, long protein chains need to be broken up into small fragments which can then be sequenced individually. Digestion is done either by endopeptidases such as trypsin or pepsin or by chemical reagents such as cyanogen bromide. Different enzymes give different cleavage patterns, and the overlap between fragments can be used to construct an overall sequence.

Phenylisothiocyanate is reacted with an uncharged terminal amino group, under mildly alkaline conditions, to form a cyclical phenylthiocarbamoyl derivative. Then, under acidic conditions, this derivative of the terminal amino acid is cleaved as a thiazolinone derivative. The thiazolinone amino acid is then selectively extracted into an organic solvent and treated with acid to form the more stable phenylthiohydantoin (PTH)- amino acid derivative that can be identified by using chromatography or electrophoresis. This procedure can then be repeated again to identify the next amino acid. A major drawback to this technique is that the peptides being sequenced in this manner cannot have more than 50 to 60 residues (and in practice, under 30). The peptide length is limited due to the cyclical derivitization not always going to completion. The derivitization problem can be resolved by cleaving large peptides into smaller peptides before proceeding with the reaction. It is able to accurately sequence up to 30 amino acids with modern machines capable of over 99% efficiency per amino acid. An advantage of the Edman degradation is that it only uses 10 - 100 picomoles of peptide for the sequencing process. Edman degradation reaction is automated to speed up the process.[38] [39]

### N-terminal amino acid analysisEdit

Determining which amino acid forms the N-terminus of a peptide chain is useful for two reasons: to aid the ordering of individual peptide fragments' sequences into a whole chain, and because the first round of Edman degradation is often contaminated by impurities and therefore does not give an accurate determination of the N-terminal amino acid. A generalised method for N-terminal amino acid analysis follows: React the peptide with a reagent which will selectively label the terminal amino acid. Hydrolyse the protein. Determine the amino acid by chromatography and comparison with standards. There are many different reagents which can be used to label terminal amino acids. They all react with amine groups and will therefore also bind to amine groups in the side chains of amino acids such as lysine - for this reason it is necessary to be careful in interpreting chromatograms to ensure that the right spot is chosen. Two of the more common reagents are Sanger's reagent (1-fluoro-2,4-dinitrobenzene) and dansyl derivatives such as dansyl chloride. Phenylisothiocyanate, the reagent for the Edman degradation, can also be used. The same questions apply here as in the determination of amino acid composition, with the exception that no stain is needed, as the reagents produce coloured derivatives and only qualitative analysis is required, so the amino acid does not have to be eluted from the chromatography column, just compared with a standard. Another consideration to take into account is that, since any amine groups will have reacted with the labelling reagent, ion exchange chromatography cannot be used, and thin layer chromatography or high pressure liquid chromatography should be used instead[40].

### C-terminal amino acid analysisEdit

The number of methods available for C-terminal amino acid analysis is much smaller than the number of available methods of N-terminal analysis. The most common method is to add carboxypeptidases to a solution of the protein, take samples at regular intervals, and determine the terminal amino acid by analysing a plot of amino acid concentrations against time

### Mass spectrometryEdit

Present day researchers are using Mass spectrometry an important tool for the characterization of proteins. Protein mass spectrometry refers to the application of mass spectrometry to the study of proteins. The two primary methods for ionization of whole proteins are electrospray ionization (ESI) and matrix-assisted laser desorption/ionization (MALDI). In keeping with the performance and mass range of available mass spectrometers, two approaches are used for characterizing proteins. In the first, intact proteins are ionized by either of the two techniques described above, and then introduced to a mass analyzer. This approach is referred to as "top-down" strategy of protein analysis. In the second, proteins are enzymatically digested into smaller peptides using a protease such as trypsin. Subsequently these peptides are introduced into the mass spectrometer and identified by peptide mass fingerprinting or tandem mass spectrometry. Hence, this latter approach (also called "bottom-up" proteomics) uses identification at the peptide level to infer the existence of proteins.

Whole protein mass analysis is primarily conducted using either time-of-flight (TOF) MS, or Fourier transform ion cyclotron resonance (FT-ICR). These two types of instrument are preferable here because of their wide mass range, and in the case of FT-ICR, its high mass accuracy. Mass analysis of proteolytic peptides is a much more popular method of protein characterization, as cheaper instrument designs can be used for characterization. Additionally, sample preparation is easier once whole proteins have been digested into smaller peptide fragments. The most widely used instrument for peptide mass analysis are the MALDI time-of-flight instruments as they permit the acquisition of peptide mass fingerprints (PMFs) at high pace (1 PMF can be analyzed in approx. 10 sec). Multiple stage quadrupole-time-of-flight and the quadrupole ion trap also find use in this application.

## Types of proteinEdit

### Conjugated proteinEdit

A conjugated protein is a protein that functions in interaction with other chemical groups attached by covalent bonds or by weak interactions. Many proteins contain only amino acids and no other chemical groups, and they are called simple proteins. However, other kind of proteins yield, on hydrolysis, some other chemical component in addition to amino acids and they are called conjugated proteins. The nonamino part of a conjugated protein is usually called its prosthetic group. Most prosthetic groups are formed from vitamins. Conjugated proteins are classified on the basis of the chemical nature of their prosthetic groups. Some examples of conjugated proteins are

#### LipoproteinsEdit

A lipoprotein is a biochemical assembly that contains both proteins and lipids water-bound to the proteins. Many enzymes, transporters, structural proteins, antigens, adhesins and toxins are lipoproteins. Examples include the high density (HDL) and low density (LDL) lipoproteins which enable fats to be carried in the blood stream, the transmembrane proteins of the mitochondrion and the chloroplast, and bacterial lipoproteins.

#### GlycoproteinsEdit

Glycoproteins are proteins that contain oligosaccharide chains (glycans) covalently attached to polypeptide side-chains. The carbohydrate is attached to the protein in a cotranslational or posttranslational modification. This process is known as glycosylation. In proteins that have segments extending extracellularly, the extracellular segments are often glycosylated. Glycoproteins are often important integral membrane proteins, where they play a role in cell-cell interactions. Glycoproteins also occur in the cytosol, but their functions and the pathways producing these modifications in this compartment are less well-understood.Glycoproteins are generally the largest and most abundant group of conjugated proteins. They range from glycoproteins in cell surface membranes that constitute the glycocalyx, to important antibodies produced by leukocytes.

#### phosphoproteinsEdit

Phosphoproteins are proteins which are chemically bonded to a substance containing phosphoric acid (see phosphorylation for more). The category of organic molecules that includes Fc receptors, Ulks, Calcineurins, K chips, and urocortins.

#### MetalloproteinEdit

A protein that contains a metal ion ass cofactor known as Metalloprotein. Metalloproteins have many different functions in cells, such as enzymes, transport and storage proteins, and signal transduction proteins. Indeed, about one quarter to one third of all proteins require metals to carry out their functions. The metal ion is usually coordinated by nitrogen, oxygen or sulfur atoms belonging to amino acids in the polypeptide chain and/or a macrocyclic ligand incorporated into the protein. The presence of the metal ion allows metalloenzymes to perform functions such as redox reactions that cannot easily be performed by the limited set of functional groups found in amino acids.

Computer-generated 3-D representation of the zinc finger motif of proteins, consisting of an α helix and an antiparallel β sheet. The zinc ion (green) is coordinated by two histidine residues and two cysteine residues.
Metal Ion Examples of enzymes containing this ion
Magnesium Glucose 6-phosphatase
Hexokinase
DNA polymerase
Manganese Arginase
Iron Catalase
Hydrogenase
IRE-BP
Aconitase
Nickel[41] Urease
Hydrogenase
Copper Cytochrome oxidase
Laccase
Zinc Alcohol dehydrogenase
Carboxypeptidase
Aminopeptidase
Beta amyloid
Molybdenum Nitrate reductase
Selenium Glutathione peroxidase
various Metallothionein
Phosphatase

#### hemoproteinsEdit

A hemeprotein (or hemoprotein or haemoprotein), or heme protein, is a metalloprotein containing a heme prosthetic group, either covalently or noncovalently bound to the protein itself. The iron in the heme is capable of undergoing oxidation and reduction (usually to +2 and +3, though stabilized Fe+4 and even Fe+5 species are well known in the peroxidases). Hemoproteins probably evolved from a primordial strategy allowing to incorporate the iron (Fe) atom contained within the protoporphyrin IX ring of heme into proteins. This strategy has been maintained throughout evolution as it makes hemoproteins responsive to molecules that can bind divalent iron (Fe). These molecules included, but are probably not restricted to, gaseous molecules, such as oxygen (O2) nitric oxide (NO), carbon monoxide (CO) and hydrogen sulfide (H2S). Once bound to the prosthetic heme groups of hemoproteins these gaseous molecules can modulate the activity/function of those hemoproteins in a way that is said to afford signal transduction. Therefore, when produced in biologic systems (cells), these gaseous molecules are referred to as gasotransmitters.Haemoglobin contains the prosthetic group containing iron, which is the haem. It is with in the haem group that carries the oxygen molecule through the binding of the oxygen molecule to the iron ion (Fe2+) found in the haem group[42].

Hemoglobin Hemoglobin (also spelled haemoglobin and abbreviated Hb or Hgb) is the iron-containing oxygen-transport metalloprotein in the red blood cells of all vertebrates[1] (except the fish family Channichthyidae ) and the tissues of some invertebrates. Hemoglobin in the blood is what transports oxygen from the lungs or gills to the rest of the body (i.e. the tissues) where it releases the oxygen for cell use, and collects carbon dioxide to bring it back to the lungs. In mammals the protein makes up about 97% of the red blood cells' dry content, and around 35% of the total content (including water)[citation needed]. Hemoglobin has an oxygen binding capacity of 1.34 ml O2 per gram of hemoglobin, which increases the total blood oxygen capacity seventyfold. Hemoglobin is involved in the transport of other gases: it carries some of the body's respiratory carbon dioxide (about 10% of the total) as carbaminohemoglobin, in which CO2 is bound to the globin protein. The molecule also carries the important regulatory molecule nitric oxide bound to a globin protein thiol group, releasing it at the same time as oxygen. Hemoglobin is also found outside red blood cells and their progenitor lines. Other cells that contain hemoglobin include the A9 dopaminergic neurons in the substantia nigra, macrophages, alveolar cells, and mesangial cells in the kidney. In these tissues, hemoglobin has a non-oxygen-carrying function as an antioxidant and a regulator of iron metabolism. Hemoglobin and hemoglobin-like molecules are also found in many invertebrates, fungi, and plants. In these organisms, hemoglobins may carry oxygen, or they may act to transport and regulate other things such as carbon dioxide, nitric oxide, hydrogen sulfide and sulfide. A variant of the molecule, called leghemoglobin, is used to scavenge oxygen, to keep it from poisoning anaerobic systems, such as nitrogen-fixing nodules of leguminous plants. phytochromes,

Cytochromes

Cytochrome c with heme c.

Cytochromes are, in general, membrane-bound hemoproteins that contain heme groups and carry out electron transport. They are found either as monomeric proteins (e.g., cytochrome c) or as subunits of bigger enzymatic complexes that catalyze redox reactions. They are found in the mitochondrial inner membrane and endoplasmic reticulum of eukaryotes, in the chloroplasts of plants, in photosynthetic microorganisms, and in bacteria.

 Cytochromes Combination a and a3 Cytochrome c oxidase ("Complex IV") with electrons delivered to complex by soluble cytochrome c (hence the name) b and c1 Coenzyme Q - cytochrome c reductase ("Complex III") b6 and f Plastoquinol—plastocyanin reductase
3-dimensional structure of bovine rhodopsin. The seven transmembrane domains are shown in varying colors. The chromophore is shown in red.
 Type prosthetic group Cytochrome a heme a Cytochrome b heme b Cytochrome d tetrapyrrolic chelate of iron

#### OpsinsEdit

Opsins are a group of light-sensitive 35-55 kDa membrane-bound G protein-coupled receptors of the retinylidene protein family found in photoreceptor cells of the retina. Five classical groups of opsins are involved in vision, mediating the conversion of a photon of light into an electrochemical signal, the first step in the visual transduction cascade. Another opsin found in the mammalian retina, melanopsin, is involved in circadian rhythms and pupillary reflex but not in image-forming.

#### FlavoproteinsEdit

Flavoproteins are proteins that contain a nucleic acid derivative of riboflavin: the flavin adenine dinucleotide (FAD) or flavin mononucleotide (FMN). Flavoproteins are involved in a wide array of biological processes, including, but by no means limited to, bioluminescence, removal of radicals contributing to oxidative stress, photosynthesis, DNA repair, and apoptosis. The spectroscopic properties of the flavin cofactor make it a natural reporter for changes occurring within the active site; this makes flavoproteins one of the most-studied enzyme families.

### Simple proteinsEdit

The proteins which upon hydrolysis yield only amino acids are known as simple proteins.

#### AlbuminEdit

Albumin (Latin: albus, white) refers generally to any protein that is water soluble, which is moderately soluble in concentrated salt solutions, and experiences heat denaturation. They are commonly found in blood plasma, and are unique to other plasma proteins in that they are not glycosylated. Substances containing albumin, such as egg white, are called albuminoids.

#### GlobulinEdit

Globulin is one of the three types of serum proteins, the others being albumin and fibrinogen. Some globulins are produced in the liver, while others are made by the immune system. The term globulin encompasses a heterogeneous group of proteins with typical high molecular weight, and both solubility and electrophoretic migration rates lower than for albumin.

#### HistonesEdit

In biology, histones are highly alkaline proteins found in eukaryotic cell nuclei, which package and order the DNA into structural units called nucleosomes. They are the chief protein components of chromatin, acting as spools around which DNA winds, and play a role in gene regulation.

### Derived proteinEdit

#### PeptonesEdit

Peptones are derived from animal milk or meat digested by proteolytic digestion. In addition to containing small peptides, the resulting spray-dried material includes fats, metals, salts, vitamins and many other biological compounds. Peptone is used in nutrient media for growing bacteria and fungi

#### ProteasesEdit

Proteases occur naturally in all organisms. These enzymes are involved in a multitude of physiological reactions from simple digestion of food proteins to highly-regulated cascades (e.g., the blood-clotting cascade, the complement system, apoptosis pathways, and the invertebrate prophenoloxidase-activating cascade). Proteases can either break specific peptide bonds (limited proteolysis), depending on the amino acid sequence of a protein, or break down a complete peptide to amino acids (unlimited proteolysis). The activity can be a destructive change, abolishing a protein's function or digesting it to its principal components; it can be an activation of a function, or it can be a signal in a signaling pathway.

## Protein data bank or PDBEdit

Like fuel and flame, two forces converged to initiate the Protein data bank (PDB): 1) a small but growing data base of sets of protein structures determined by X-ray diffraction and 2) the newly available (1968) molecular graphics display, the BRookhaven Raster Display (BRAD), to inspect these structures in 3-D. In 1969, with the sponsorship of Dr. Walter Hamilton at the Brookhaven National Laboratory, Dr. Edgar Meyer (Texas A&M University) began to write software to store atomic coordinate files in a common format to make them available for geometric and graphical evaluation. By 1971 program SEARCH was executed remotely to extract and examine structural data and thereby was instrumental in initiating networking, thus marking the functional beginning of the PDB.

Upon Hamilton's death in 1973, Dr. Tom Koeztle took over direction of the PDB for the subsequent 20 years. In January 1994, Dr. Joel Sussman of Israel's Weizmann Institute of Science was appointed head of the PDB. In October 1998,[43] the PDB was transferred to the Research Collaboratory for Structural Bioinformatics (RCSB); the transfer was completed in June 1999. The new director was Dr. Helen M. Berman of Rutgers University (one of the member institutions of the RCSB).[44] In 2003, with the formation of the wwPDB, the PDB became an international organization. The founding members are PDBe (Europe), RCSB(USA), and PDBj (Japan). The BMRB joined in 2006. Each of the four members of wwPDB can act as deposition, data processing and distribution centers for PDB data. The data processing refers to the fact that wwPDB staff review and annotates each submitted entry. The data are then automatically checked for plausibility (the source code for this validation software has been made available to the public at no charge).

The Protein Data Bank (PDB) is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids. (See also crystallographic database). The data, typically obtained by X-ray crystallography or NMR spectroscopy and submitted by biologists and biochemists from around the world, are freely accessible on the Internet via the websites of its member organisations (PDBe, PDBj, and RCSB). The PDB is overseen by an organization called the Worldwide Protein Data Bank, wwPDB.

## InsulinEdit

The structure of insulin. The left side is a space-filling model of the insulin monomer, believed to be biologically active. Carbon is green, hydrogen white, oxygen red, and nitrogen blue. On the right side is a ribbon diagram of the insulin hexamer, believed to be the stored form. A monomer unit is highlighted with the A chain in blue and the B chain in cyan. Yellow denotes disulfide bonds, and magenta spheres are zinc ions.
Insulin hexamers highlighting the threefold symmetry, the zinc ions (center) binding with histidine.

Within vertebrates, the amino acid sequence of insulin is extremely well preserved. Bovine insulin differs from human in only three amino acid residues, and porcine insulin in one. Even insulin from some species of fish is similar enough to human to be clinically effective in humans. Insulin in some invertebrates is quite similar in sequence to human insulin, and has similar physiological effects. The strong homology seen in the insulin sequence of diverse species suggests that it has been conserved across much of animal evolutionary history. The C-peptide of proinsulin , however, differs much more amongst species; it is also a hormone, but a secondary one.

Insulin is produced and stored in the body as a hexamer (a unit of six insulin molecules), while the active form is the monomer. The hexamer is an inactive form with long-term stability, which serves as a way to keep the highly reactive insulin protected, yet readily available. The hexamer-monomer conversion is one of the central aspects of insulin formulations for injection. The hexamer is far more stable than the monomer, which is desirable for practical reasons, however the monomer is a much faster reacting drug because diffusion rate is inversely related to particle size. A fast reacting drug means that insulin injections do not have to precede mealtimes by hours, which in turn gives diabetics more flexibility in their daily schedule. Insulin can aggregate and form fibrillar interdigitated beta-sheets. This can cause injection amyloidosis, and prevents the storage of insulin for long periods[45].

In 1869 Paul Langerhans, a medical student in Berlin, was studying the structure of the pancreas under a microscope when he identified some previously un-noticed tissue clumps scattered throughout the bulk of the pancreas. The function of the "little heaps of cells," later known as the Islets of Langerhans, was unknown, but Edouard Laguesse later suggested that they might produce secretions that play a regulatory role in digestion. Paul Langerhans' son, Archibald, also helped to understand this regulatory role. The term insulin origins from insula, the Latin word for islet/island. In 1889, the Polish-German physician Oscar Minkowski in collaboration with Joseph von Mering removed the pancreas from a healthy dog to test its assumed role in digestion. Several days after the dog's pancreas was removed, Minkowski's animal keeper noticed a swarm of flies feeding on the dog's urine. On testing the urine they found that there was sugar in the dog's urine, establishing for the first time a relationship between the pancreas and diabetes. In 1901, another major step was taken by Eugene Opie, when he clearly established the link between the Islets of Langerhans and diabetes: Diabetes mellitus … is caused by destruction of the islets of Langerhans and occurs only when these bodies are in part or wholly destroyed. Before his work, the link between the pancreas and diabetes was clear, but not the specific role of the islets.

The Nobel Prize committee in 1923 credited the practical extraction of insulin to a team at the University of Toronto and awarded the Nobel Prize to two men; Fredericus Bantam and J.J.R. Macleon. They were awarded the Nobel Prize in Physiology or Medicine in 1923 for the discovery of insulin. Bantam, insulted that Best was not mentioned, shared his prize with Best, and Macleon immediately shared his with James Collip. The patent for insulin was sold to the University of Toronto for one half-dollar.

The primary structure of insulin was determined by British molecular biologist Frederick Sanger. It was the first protein to have its sequence be determined. He was awarded the 1958 Nobel Prize in Chemistry for this work. In 1969, after decades of work, Dorothy Crowfoot Hodgkin determined the spatial conformation of the molecule, the so-called tertiary structure, by means of X-ray diffraction studies. She had been awarded a Nobel Prize in Chemistry in 1964 for the development of crystallography. Rosalyn Sussman Yalow received the 1977 Nobel Prize in Medicine for the development of the radioimmunoassay for insulin[46].

## ReferencesEdit

1. http://en.wikipedia.org/w/index.php?title=Protein&oldid=425576197
2. http://en.wikipedia.org/w/index.php?title=Proteinogenic_amino_acid&oldid=420804587
3. http://en.wikipedia.org/w/index.php?title=Amino_acid&oldid=425389108
4. http://en.wikipedia.org/w/index.php?title=Amino_acid&oldid=425389108
5. Photo-leucine and photo-methionine allow identification of protein-protein interactions in living cells.Nature Methods:4,261–7,2005
6. http://en.wikipedia.org/w/index.php?title=Peptide_bond&oldid=417601014
7. Basler B, Schuster O, Bach T (November 2005). "Conformationally constrained β-amino acid derivatives by intramolecular [2 + 2]-photocycloaddition of a tetronic acid amide and subsequent lactone ring opening". J. Org. Chem. 70 (24): 9798–808. doi:10.1021/jo0515226. PMID 16292808.
8. Murray JK, Farooqi B, Sadowsky JD, et al. (September 2005). "Efficient synthesis of a β-peptide combinatorial library with microwave irradiation". J. Am. Chem. Soc. 127 (38): 13271–80. doi:10.1021/ja052733v. PMID 16173757.
9. Seebach D, Matthews JL (1997). "β-Peptides: a surprise at every turn". Chem. Commun. (21): 2015–22. doi:10.1039/a704933a.
10. http://en.wikipedia.org/w/index.php?title=Enzyme&oldid=424282616
11. http://en.wikipedia.org/w/index.php?title=Enzyme&oldid=424282616
12. Moss, G.P.. "Recommendations of the Nomenclature Committee". International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzymes by the Reactions they Catalyse. Retrieved 2006-03-14.
13. http://en.wikipedia.org/w/index.php?title=Enzyme&oldid=424282616
14. Lovell SC et al. (2003). "Structure validation by Cα geometry: φ,ψ and Cβ deviation". Proteins 50 (3): 437–450. doi:10.1002/prot.10286. PMID 12557186.
15. http://en.wikipedia.org/w/index.php?title=Protein_primary_structure&oldid=415921787
16. Pauling L, Corey RB, Branson HR (1951). "The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain". Proc Natl Acad Sci USA 37 (4): 205–211. doi:10.1073/pnas.37.4.205. PMID 14816373.
17. Chiang YS, Gelfand TI, Kister AE, Gelfand IM (2007). "New classification of supersecondary structures of sandwich-like proteins uncovers strict patterns of strand assemblage.". Proteins. 68 (4): 915–921. doi:10.1002/prot.21473. PMID 17557333.
18. RAMACHANDRAN GN, RAMAKRISHNAN C, SASISEKHARAN V (July 1963). "Stereochemistry of polypeptide chain configurations". J. Mol. Biol. 7: 95–9
19. http://en.wikipedia.org/w/index.php?title=Alpha_helix&oldid=423162580
20. Hutchinson EG, Thornton JM (April 1993). "The Greek key motif: extraction, classification and analysis". Protein Eng. 6 (3): 233–45. doi:10.1093/protein/6.3.233. PMID 8506258.
21. SCOP: Fold: WW domain-like
22. PPS '96 - Super Secondary Structure
23. Hutchinson, E.; Thornton, J. (1996). "PROMOTIF—A program to identify and analyze structural motifs in proteins". Protein Science 5 (2): 212–220. doi:10.1002/pro.5560050204. PMID 8745398.
24. Hutchinson EG, Thornton JM (1990). "HERA--a program to draw schematic diagrams of protein secondary structures". Proteins 8 (3): 203–12. doi:10.1002/prot.340080303. PMID 2281084.
25. http://en.wikipedia.org/w/index.php?title=Coiled_coil&oldid=427735447
26. Steven Bottomley (2004). "Interactive Protein Structure Tutorial". Retrieved January 9, 2011.
27. http://en.wikipedia.org/w/index.php?title=Protein_tertiary_structure&oldid=422486540
28. http://en.wikipedia.org/wiki/Protein_quaternary_structure
29. Crowfoot Hodgkin D (1935). "X-ray Single Crystal Photographs of Insulin". Nature 135: 591. doi:10.1038/135591a0.
30. Kendrew J. C. et al. (1958-03-08). "A Three-Dimensional Model of the Myoglobin Molecule Obtained by X-Ray Analysis". Nature 181 (4610): 662. doi:10.1038/181662a0. PMID 13517261.
31. "PDB Statistics". RCSB Protein Data Bank. Retrieved 2010-02-09.
32. Scapin G (2006). "Structural biology and drug discovery". Curr. Pharm. Des. 12 (17): 2087. doi:10.2174/138161206777585201. PMID 16796557.
33. Lundstrom K (2006). "Structural genomics for membrane proteins". Cell. Mol. Life Sci. 63 (22): 2597. doi:10.1007/s00018-006-6252-y. PMID 17013556.
34. Lundstrom K (2004). "Structural genomics on membrane proteins: mini review". Comb. Chem. High Throughput Screen. 7 (5): 431. PMID 15320710.
35. http://en.wikipedia.org/w/index.php?title=Protein_sequencing&oldid=413170994
36. Niall HD (1973). "Automated Edman degradation: the protein sequenator". Meth. Enzymol. 27: 942–1010. doi:10.1016/S0076-6879(73)27039-8. PMID 4773306.
37. http://en.wikipedia.org/w/index.php?title=Protein_sequencing&oldid=413170994
38. http://en.wikipedia.org/w/index.php?title=Protein_sequencing&oldid=413170994
39. Astrid Sigel, Helmut Sigel and Roland K.O. Sigel, ed (2008). Nickel and Its Surprising Impact in Nature. Metal Ions in Life Sciences. 2. Wiley. ISBN 978-0-470-01671-8.
40. http://en.wikipedia.org/w/index.php?title=Hemeprotein&oldid=410476687
41. Berman, H. M.; et al. (January 2000). "The Protein Data Bank". Nucleic Acids Res. 28 (1): 235–242. doi:10.1093/nar/28.1.235. PMID 10592235. PMC 102472.
42. http://en.wikipedia.org/w/index.php?title=Insulin&oldid=425481933
43. http://en.wikipedia.org/w/index.php?title=Insulin&oldid=425481933