Proteomics/Proteomics and Drug Discovery/Protein Aggregation

Previous page
Software Tools
Proteomics and Drug Discovery Next page
Next Chapter - Contributors
Protein Aggregation

Chapter written by: Ashlee Benjamin and Rhea Sanchez

Contact or for contributions

This Section:

Introduction to Protein Aggregation


Protein Aggregation has become a topic of growing interest in recent years, especially in pharmaceutical research. Protein aggregation is often encountered during late research stages or manufacturing of biopharma. The presence of aggregates often leads to an immune response to reject the product, and can sometimes interfere with the body’s vital functions. Aggregation causes several serious diseases such as Alzheimer’s and Type II Diabetes [1]. Antibodies or other small molecules that are developed for research can sometimes aggregate when over-expressed costing companies’ time and money. Many companies are taking advantage of bioinformatics techniques to predict aggregation. Such approaches are just predictions, but are still very useful for analyses during research and development.

What is Aggregation?


Aggregation and Amyloid Fibrils


Protein aggregation can be described as the fibrilization or formation of insoluble structures from completely or partially unfolded peptides [2]. A peptide can bind to itself or to other proteins in the cell in an unnatural way. This often occurs when proteins are over-expressed. There is more than one type of protein aggregation; however the best-studied type is amyloid fibrilization. Amyloid fibrils are primarily composed of beta sheets and the main chain dominates the protein’s structure. These fibrils are highly organized structures stabilized by hydrogen bonds [3]. Amyloid fibrils are usually 10nm in diameter and consist of approximately 2-6 “protofibrils” twisted around one another. Protofibrils are the precursors to amyloid fibrils and are the most toxic to the cell. The protofibrils are made up of a cross-beta structure. In a cross-beta structure, beta sheets are assembled from beta strands that run perpendicular to the fibrils. An image of an amyloid fibrils is shown in Figure 1 [4].

Figure 1: Small Amyloid Fibril

Figure 1 - Figure 1 shows an amyloid fibril. It can be seen that the fibril is composed of 2 protofibrils twisted around one another. The protofibrils are composed of the cross-beta structure with the beta sheets running perpendicular to the fibril.

The native state of a protein that yields proper function is usually the most energetically favorable form of the protein at environmental conditions. Protein aggregation occurs when inter-chain contacts with other parts of that protein or other proteins. Because of this, these aggregates are thought to be as energetically favorable or more energetically favorable than the native state. There are several diseases linked to protein aggregation. This discussion will examine Alzheimer's disease and Transthyretin related diseases [4].

Different Types of Aggregates


Many different types of aggregation can occur. They are classified based on types of interactions and solubility. Soluble aggregates are invisible particles and cannot be removed with a filter. Insoluble aggregates can be removed by filtration and are often visible to the human eye. Both types of aggregates cause problems in biopharma development. Covalent aggregates arise from the formation of a covalent bond between multiple monomers of a given peptide. Disulfide bond formation of free thiols is a common mechanism for covalent aggregation. Oxidation of tyrosine residues can lead to formation of bityrosine which often results in aggregation. Reversible protein aggregation typically results from weaker protein interactions. The reversibility of this type of aggregation can change when environmental factors such as protein concentration, salt concentration, or pH are varied [5].

Advancements in Understanding of Protein Folding


In order for us to understand protein aggregation, we need to understand more about protein folding in general. It would not be feasible to understand and account for all factors involved in aggregation when developing a prediction tool. However, the more that is understood about such factors, the better a prediction model may be. Advances have been made in understanding the protein folding process. These advances can help us to understand how the process "goes wrong" and results in aggregation. Figure 2 shows an image of an unfolded peptide chain and its corresponding native folded state peptide.

Figure 2: Protein Folding

Figure 2 - Figure 2 shows an unfolded polypeptide and its transition to a fully folded peptide [6].

The general folding process of a protein can be described as a “stochastic search” for the native state [3][7]. The native state is usually a stable, low energy conformation. More specifically, smaller proteins fold by a nucleation-condensation mechanism, and larger proteins fold in modules or smaller parts [7]. Nucleation-condensation is the formation of a folding nucleus, about which the rest of the structure condenses or collapses. As more is understood about the folding process, the less need there will be for prediction models such as these. However, in the mean time such models can be created to increase the understanding of aggregation and perhaps protein folding in general.

Factors Affecting Aggregation


It has been suggested that a few general principles or protein characteristics may govern aggregation and fibrilization [7][2]. Although it has been shown that any generic protein under suitable external conditions has the ability to form aggregates resembling amyloid fibrils, aggregation depends on characteristics of the sequence as well. Aside from external conditions such as temperature and characteristics of the sequence itself, characteristics of peptide monomers may not give insight into the aggregation propensity of their polymer versions because interpeptide chain interactions may cause conformational changes [2]. Regardless of all factors involved, much is known about what factors can be utilized to predict aggregation. Known intrinsic properties that can effect aggregation include charge, hydrophobicity, hydrophobic/hydrophilic patterns, and secondary structure propensities.

Hydrophobic surfaces that become exposed upon denaturation are highly vulnerable to promiscuous interactions that can lead to protein aggregation. These surfaces act as "sticky" spots and try to interact with whatever they can. High helix secondary structure propensities significantly decrease the probability of aggregation, whereas high beta sheet propensities significantly increase aggregation. Charged residues have also been shown to decrease aggregation. Extrinsic factors such as ionic strength, temperature, pH, protein concentration, chaperones, quality control, and pressure have been shown to have an effect of aggregation [1]. The presence of some ligands and ions, can increase aggregation. Stress applied to the protein can result in denaturation which can lead to subsequent aggregation. Some of these stresses include freezing, exposure to air, or interactions with metal surfaces – many of which are involved in development of biopharma. Because of the involvement of these stresses in drug development, the more we understand aggregation, the better control we have in drug development. Physicochemical properties such as polar or non-polar accessible surface areas, dipole moment, and aromatic residues are also important [8][9]. It has been shown that inward pointing charged residues and fewer dangling hydrogen bonds will decrease the amount of protein aggregation [2]. Utilizing all of this knowledge, we can determine how likely a given protein is to aggregate. These factors can, and have been utilized to predict protein aggregation.

When a protein is being expressed, drastic increases in protein concentration can lead to intracellular aggregation. This is either due to interactions of unfolded protein molecules or poor recognition of the nascent chain by molecular chaperones.

Of course, there is still the problem of the natural means by which an organism prevents aggregation. Chaperones proteins assist in the folding of proteins. Gatekeeper residues may also help prevent aggregation [1][3][7][2]. Extrinsic factors such as these are nearly impossible to utilize when creating a prediction model.


Alzheimer’s Disease


Alzheimer's disease is a neurodegenerative disease caused by the aggregation of the beta-amyloid peptide. Beta–amyloid peptide is prone to aggregation in the human body as well as in experimental conditions. The native state of this peptide is relatively unstable leading to aggregation. Aggregation of this peptide in the brain leads to the formation of aggregate plaques. Figure 3 shows these plaques in the cerebral cortex of an Alzheimer’s patient.

Figure 3: Beta-amyloid Plaques

Figure 3 - Figure 3 shows Beta-amyloid plaques in the cerebral cortex of a patient with Alzheimer’s Disease. [10]

Beta-amyloid monomers are thought to form a hexagonal arrangement, and then proceed to form amyloid fibrils. The exact structure of the intermediate steps has not been determined [4].


Transthyretin, or TTR, is a protein that when aggregated causes senile systemic amyloidosis, familial amyloidotic neuropathy, and some other rare neurodegenerative diseases. Transthyretin is found in the Central nervous system. TTR is a serum protein in the cerebrospinal fluid and carries thyroxine, a thyroid hormone. In the blood, albumin usually carries out this function, but albumin is not present in cerebrospinal fluid. Many of the TTR diseases have been shown to be caused by point mutations in the peptide sequence. These mutations change the intrinsic properties of the sequence increasing the propensity for aggregation. TTR is a tetramer and aggregation begins when the tetramer dissociates. After the separation, the peptide chains make energetically favorable contacts that favor fibrilization. From this point, aggregation of the TTR monomers proceeds rapidly. The intermediate structures have not been determined with TTR aggregation [4].

Created Prediction Models


In an effort to better understand and predict protein aggregation, several research groups have been developing predictive models based on different principles.

The prediction method created by Tartaglia et al. is based on physicochemical properties. The method is an ab initio prediction method based on polar and non-polar accessible surface areas, dipole moment, charge, aromatic residues, and beta-sheet propensity. The prediction gives an aggregation rate and “amyloid spectrum.” This method scans each sequence with a determined window size, moving the window one amino acid at a time. Each stretch is ranked using the aggregation propensity. The positions of the three highest ranking stretches are stored. This method yielded an impressive 95% correlation with experimental aggregation rates [8].

The AGGRESCAN method is based on experimentally derived aggregation propensities of individual amino acids. A sliding window is used to calculate the average aggregation propensity of the window and the value is assigned to the center amino acid. This gives a profile of “hot spots” for aggregation. The method gives a graphical representation of the aggregation profile including hot spots, peak areas, and aggregation values [11].

The TANGO method is a statistical mechanisms algorithm that identifies regions of a sequence that are prone to aggregation. The method considers several different types of intrinsic structural tendencies of the polypeptide sequence and determines which is the most likely conformation for a sliding window of varying length. This method was between 87-92% correct for the prediction of aggregation in general but like most aggregation prediction tools does not distinguish amyloid formation from amorphous aggregation. [12].

The method created by Pawar et al. uses intrinsic properties of a sequence to find aggregation prone and aggregation susceptible areas. The method utilizes alpha-helix propensity, beta-sheet propensity, charge, hydrophobicity, and hydrophobic/hydrophilic patterns to predict aggregation prone and aggregation susceptible areas of a sequence. An overall aggregation propensity, z-score for propensity, aggregation rate, and rate or propensity profile can be calculated with this method. When calculating a profile, each amino acid is mutated to all other possibilities and the rate or propensity is calculated. The maximum, minimum and wild type values are stored and these profiles are smoothed over a sliding window of 7 amino acids [13].

Can the sequence characteristics of a peptide provide insights into the sites that harbor amyloid tendencies? Known diseases associated with aggregation give us a research basis upon which to start pattern searching. From what has been seen with bioinformatics approaches in this area so far, sequence characteristics do tell us a lot about the aggregation of a protein. However, no perfect model has been built, so it is clear that we do not understand every aspect of protein aggregation. The hope is to understand protein aggregation and to learn how to prevent it.


  1. a b c "Prediction of the absolute aggregation rates of amyloidogenic polypeptide chains." Dubay, K. F., Pawar, A. P., Chiti, F., Zurdo, J., Dobson, C. M., & Vendruscolo, M. (2004). J. Mol. Biol. 341, 1317-1326. Abstract
  2. a b c d e "Emerging ideas on the molecular basis of protein and peptide aggregation." Thirumalai, D., Klimov, D. K., & Dima, R. I. (2003). Current Opinion in Structural Biology. 13, 146-159. Abstract
  3. a b c "Experimental investigation of protein folding and misfolding." Dobson, C. M. (2004). Methods. 34, 4-14. Abstract
  4. a b c d Accessed 27 March 2008.
  5. Protein Aggregation and Bioprocessing Accessed 21 March 2008.
  6. “Protein Folding.” Accessed 2 April 2008.
  7. a b c d "Principles of protein folding, misfolding and aggregation." Dobson, C. M. (2004). Seminars in Cell & Developmental Biology. 15, 3-16.Abstract
  8. a b "The role of aromaticity, exposed surface, and dipole moment in determining protein aggregation rates" Tartaglia, G. G., Cavalli, A., Pellarin, R., & Caflisch, A. (2004). Protein Sci. 13, 1939-1941. Full Text
  9. Tartaglia, G. G., Cavalli, A., Pellarin, R., & Caflisch, A. (2004). Protein Sci. 14, 2723-2734.
  10. “Alzheimer’s Disease.” Accessed 2 April 2008.
  11. AGGRESCAN: a server for the prediction and evaluation of "hot spots" of aggregation in polypeptides" Conchillo-Sole, O., de Groot, N. S., Aviles, F. X., Vendrell, J., Daura, X., & Ventura, S. (2007). BMC Bioinformatics. 8, 65-81. Full Text
  12. "Fernandez-Escamilla, A. M., Rousseau, F., Schymkowitz, J. & Serrano, L. (2004). Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat Biotechnol 22, 1302-6. [1]
  13. "Prediction of "aggregation-prone" and "aggregation-susceptible" regions in proteins associated with neurodegenerative diseases." Pawar, A. P., Dubay, K. F., Zurdo, J., Chiti, F., Vendruscolo, M., & Dobson, C. M. (2005). J. Mol. Biol. 350, 379-392.Full Text