Protein structure determination

Around 90% of the protein structures available in the Protein Data Bank have been determined by X-ray crystallography. This method allows one to measure the three-dimensional (3-D) density distribution of electrons in the protein, in the crystallized state, and thereby infer the 3-D coordinates of all the atoms to be determined to a certain resolution. Roughly 9% of the known protein structures have been obtained by nuclear magnetic resonance techniques. The secondary structure composition can be determined via circular dichroism. Vibrational spectroscopy can also be used to characterize the conformation of peptides, polypeptides, and proteins. Cryo-electron microscopy has recently become a means of determining protein structures to high resolution, less than 5 angstroms or 0.5 nanometer, and is anticipated to increase in power as a tool for high resolution work in the next decade.

X-ray crystallography

Workflow for solving the structure of a molecule by X-ray crystallography.

X-ray crystallography is a tool used for identifying the atomic and molecular structure of a crystal, in which the crystalline atoms cause a beam of incident X-rays to diffract into many specific directions. By measuring the angles and intensities of these diffracted beams, a crystallographer can produce a three-dimensional picture of the density of electrons within the crystal. From this electron density, the mean positions of the atoms in the crystal can be determined, as well as their chemical bonds, their disorder and various other information.

In a single-crystal X-ray diffraction measurement, a crystal is mounted on a goniometer. The goniometer is used to position the crystal at selected orientations. The crystal is bombarded with a finely focused monochromatic beam of X-rays, producing a diffraction pattern of regularly spaced spots known as reflections. The two-dimensional images taken at different rotations are converted into a three-dimensional model of the density of electrons within the crystal using the mathematical method of Fourier transforms, combined with chemical data known for the sample. Poor resolution (fuzziness) or even errors may result if the crystals are too small, or not uniform enough in their internal makeup.

Crystallization

Crystallography generally requires a pure crystal of high regularity to solve the structure of a complicated arrangement of atoms. Protein crystals are almost always grown in solution. The most common approach is to lower the solubility of its component molecules very gradually; if this is done too quickly, the molecules will precipitate from solution, forming a useless dust or amorphous gel on the bottom of the container. Crystal growth in solution is characterized by two steps: nucleation of a microscopic crystallite (possibly having only 100 molecules), followed by growth of that crystallite, ideally to a diffraction-quality crystal. The solution conditions that favor the first step (nucleation) are not always the same conditions that favor the second step (subsequent growth). The crystallographer's goal is to identify solution conditions that favor the development of a single, large crystal, since larger crystals offer improved resolution of the molecule. Consequently, the solution conditions should disfavor the first step (nucleation) but favor the second (growth), so that only one large crystal forms per droplet. If nucleation is favored too much, a shower of small crystallites will form in the droplet, rather than one large crystal; if favored too little, no crystal will form whatsoever.

It is extremely difficult to predict good conditions for nucleation or growth of well-ordered crystals. In practice, favorable conditions are identified by screening; a very large batch of the molecules is prepared, and a wide variety of crystallization solutions are tested. Hundreds, even thousands, of solution conditions are generally tried before finding the successful one. The various conditions can use one or more physical mechanisms to lower the solubility of the molecule; for example, some may change the pH, some contain salts of the Hofmeister series or chemicals that lower the dielectric constant of the solution, and still others contain large polymers such as polyethylene glycol that drive the molecule out of solution by entropic effects. It is also common to try several temperatures for encouraging crystallization, or to gradually lower the temperature so that the solution becomes supersaturated. These methods require large amounts of the target molecule, as they use high concentration of the molecule(s) to be crystallized. Due to the difficulty in obtaining such large quantities (milligrams) of crystallization-grade protein, robots have been developed that are capable of accurately dispensing crystallization trial drops that are in the order of 100 nanoliters in volume. This means that 10-fold less protein is used per experiment when compared to crystallization trials set up by hand (in the order of 1 microliter).

Several factors are known to inhibit or mar crystallization. The growing crystals are generally held at a constant temperature and protected from shocks or vibrations that might disturb their crystallization. Impurities in the molecules or in the crystallization solutions are often inimical to crystallization. Conformational flexibility in the molecule also tends to make crystallization less likely, due to entropy. Having failed to crystallize a target molecule, a crystallographer may try again with a slightly modified version of the molecule; even small changes in molecular properties can lead to large differences in crystallization behavior.

Data collection

The crystal is mounted for measurements so that it may be held in the X-ray beam and rotated. Protein crystals are scooped up by a loop, then flash-frozen with liquid nitrogen. This freezing reduces the radiation damage of the X-rays, as well as the noise in the Bragg peaks due to thermal motion (the Debye-Waller effect). However, untreated protein crystals often crack if flash-frozen; therefore, they are generally pre-soaked in a cryoprotectant solution before freezing. Unfortunately, this pre-soak may itself cause the crystal to crack, ruining it for crystallography. Generally, successful cryo-conditions are identified by trial and error.

A diffractometer.

The capillary or loop is mounted on a goniometer, which allows it to be positioned accurately within the X-ray beam and rotated. Since both the crystal and the beam are often very small, the crystal must be centered within the beam to within ~25 micrometers accuracy, which is aided by a camera focused on the crystal. The most common type of goniometer is the "kappa goniometer", which offers three angles of rotation: the ω angle, which rotates about an axis perpendicular to the beam; the κ angle, about an axis at ~50° to the ω axis; and, finally, the φ angle about the loop/capillary axis. The oscillations carried out during data collection (mentioned below) involve the ω axis only.

The mounted crystal is then irradiated with a beam of monochromatic X-rays. The brightest and most useful X-ray sources are synchrotrons; their much higher luminosity allows for better resolution. They also make it convenient to tune the wavelength of the radiation, which is useful for multi-wavelength anomalous dispersion phasing, described below. Synchrotrons are generally national facilities, each with several dedicated beamlines where data is collected around the clock, seven days a week.

Smaller X-ray generators are often used in laboratories to check the quality of crystals before bringing them to a synchrotron and sometimes to solve a crystal structure.

X-rays are generally filtered (by use of X-Ray Filters) to a single wavelength (made monochromatic) and collimated to a single direction before they are allowed to strike the crystal. The filtering not only simplifies the data analysis, but also removes radiation that degrades the crystal without contributing useful information.

An X-ray diffraction pattern of a crystallized enzyme. The pattern of spots (reflections) and the relative strength of each spot (intensities) can be used to determine the structure of the enzyme.

When a crystal is mounted and exposed to an intense beam of X-rays, it scatters the X-rays into a pattern of spots or reflections that can be observed on a screen behind the crystal. A similar pattern may be seen by shining a laser pointer at a compact disc. The relative intensities of these spots provide the information to determine the arrangement of molecules within the crystal in atomic detail. The intensities of these reflections may be recorded with photographic film, an area detector or with a charge-coupled device (CCD) image sensor. The peaks at small angles correspond to low-resolution data, whereas those at high angles represent high-resolution data; thus, an upper limit on the eventual resolution of the structure can be determined from the first few images. Some measures of diffraction quality can be determined at this point, such as the mosaicity of the crystal and its overall disorder, as observed in the peak widths. Some pathologies of the crystal that would render it unfit for solving the structure can also be diagnosed quickly at this point.

One image of spots is insufficient to reconstruct the whole crystal; it represents only a small slice of the full Fourier transform. To collect all the necessary information, the crystal must be rotated step-by-step through 180°, with an image recorded at every step. Actually, slightly more than 180° is required to cover reciprocal space, due to the curvature of the Ewald sphere. However, if the crystal has a higher symmetry, a smaller angular range such as 90° or 45° may be recorded. The rotation axis should be changed at least once, to avoid developing a "blind spot" in reciprocal space close to the rotation axis. It is customary to rock the crystal slightly (by 0.5–2°) to catch a broader region of reciprocal space.

Multiple data sets may be necessary for certain phasing methods. For example, MAD phasing requires that the scattering be recorded at least three (and usually four, for redundancy) wavelengths of the incoming X-ray radiation. A single crystal may degrade too much during the collection of one data set, owing to radiation damage; in such cases, data sets on multiple crystals must be taken.

Data analysis

Crystal symmetry, unit cell, and image scaling

The recorded series of two-dimensional diffraction patterns, each corresponding to a different crystal orientation, is converted into a three-dimensional model of the electron density. The conversion uses the mathematical technique of Fourier transforms. Each spot corresponds to a different type of variation in the electron density; the crystallographer must determine which variation corresponds to which spot (indexing), the relative strengths of the spots in different images (merging and scaling) and how the variations should be combined to yield the total electron density (phasing).

Data processing begins with indexing the reflections. This means identifying the dimensions of the unit cell and which image peak corresponds to which position in reciprocal space. A byproduct of indexing is to determine the symmetry of the crystal, i.e., its space group. Some space groups can be eliminated from the beginning. For example, reflection symmetries cannot be observed in chiral molecules; thus, only 65 space groups of 230 possible are allowed for protein molecules which are almost always chiral. Indexing is generally accomplished using an autoindexing routine. Having assigned symmetry, the data is then integrated. This converts the hundreds of images containing the thousands of reflections into a single file, consisting of (at the very least) records of the Miller index of each reflection, and an intensity for each reflection (at this state the file often also includes error estimates and measures of partiality (what part of a given reflection was recorded on that image)).

A full data set may consist of hundreds of separate images taken at different orientations of the crystal. The first step is to merge and scale these various images, that is, to identify which peaks appear in two or more images (merging) and to scale the relative images so that they have a consistent intensity scale. Optimizing the intensity scale is critical because the relative intensity of the peaks is the key information from which the structure is determined. The repetitive technique of crystallographic data collection and the often high symmetry of crystalline materials cause the diffractometer to record many symmetry-equivalent reflections multiple times. This allows calculating the symmetry-related R-factor, a reliability index based upon how similar the measured intensities of symmetry-equivalent reflections are, thus assessing the quality of the data.

Initial phasing

The data collected from a diffraction experiment is a reciprocal space representation of the crystal lattice. The position of each diffraction 'spot' is governed by the size and shape of the unit cell, and the inherent symmetry within the crystal. The intensity of each diffraction 'spot' is recorded, and this intensity is proportional to the square of the structure factor amplitude. The structure factor is a complex number containing information relating to both the amplitude and phase of a wave. In order to obtain an interpretable electron density map, both amplitude and phase must be known (an electron density map allows a crystallographer to build a starting model of the molecule). The phase cannot be directly recorded during a diffraction experiment: this is known as the phase problem. Initial phase estimates can be obtained in a variety of ways:

Ab initio phasing or direct methods: This is usually the method of choice for small molecules (<1000 non-hydrogen atoms), and has been used successfully to solve the phase problems for small proteins. If the resolution of the data is better than 1.4 Å, direct methods can be used to obtain phase information by exploiting known phase relationships between certain groups of reflections.
Molecular replacement: If a related structure is known, it can be used as a search model in molecular replacement to determine the orientation and position of the molecules within the unit cell. The phases obtained this way can be used to generate electron density maps.
Anomalous X-ray scattering (MAD or SAD phasing): The X-ray wavelength may be scanned past an absorption edge of an atom, which changes the scattering in a known way. By recording full sets of reflections at three different wavelengths (far below, far above and in the middle of the absorption edge) one can solve for the substructure of the anomalously diffracting atoms and thence the structure of the whole molecule. The most popular method of incorporating anomalous scattering atoms into proteins is to express the protein in a methionine auxotroph (a host incapable of synthesizing methionine) in a medium rich in seleno-methionine, which contains selenium atoms. A MAD experiment can then be conducted around the absorption edge, which should then yield the position of any methionine residues within the protein, providing initial phases.
Heavy atom methods (multiple isomorphous replacement): If electron-dense metal atoms can be introduced into the crystal, direct methods or Patterson-space methods can be used to determine their location and to obtain initial phases. Such heavy atoms can be introduced either by soaking the crystal in a heavy atom-containing solution, or by co-crystallization (growing the crystals in the presence of a heavy atom). As in MAD phasing, the changes in the scattering amplitudes can be interpreted to yield the phases. Although this is the original method by which protein crystal structures were solved, it has largely been superseded by MAD phasing with selenomethionine.

Model building and phase refinement

Having obtained initial phases, an initial model can be built. This model can be used to refine the phases, leading to an improved model, and so on. Given a model of some atomic positions, these positions and their respective Debye-Waller factors (or B-factors, accounting for the thermal motion of the atom) can be refined to fit the observed diffraction data, ideally yielding a better set of phases. A new model can then be fit to the new electron density map and a further round of refinement is carried out. This continues until the correlation between the diffraction data and the model is maximized. The agreement is measured by an R-factor defined as

R={\frac {\sum {||F_{\text{obs}}|-|F_{\text{calc}}||}}{\sum {|F_{\text{obs}}|}}}

where F is the structure factor. A similar quality criterion is Rfree, which is calculated from a subset (~10%) of reflections that were not included in the structure refinement. Both R factors depend on the resolution of the data. As a rule of thumb, Rfree should be approximately the resolution in angstroms divided by 10; thus, a data-set with 2 Å resolution should yield a final Rfree ~ 0.2. Chemical bonding features such as stereochemistry, hydrogen bonding and distribution of bond lengths and angles are complementary measures of the model quality. Phase bias is a serious problem in such iterative model building. Omit maps are a common technique used to check for this.

It may not be possible to observe every atom of the crystallized molecule – it must be remembered that the resulting electron density is an average of all the molecules within the crystal. In some cases, there is too much residual disorder in those atoms, and the resulting electron density for atoms existing in many conformations is smeared to such an extent that it is no longer detectable in the electron density map. Weakly scattering atoms such as hydrogen are routinely invisible. It is also possible for a single atom to appear multiple times in an electron density map, e.g., if a protein sidechain has multiple (< 4) allowed conformations. In still other cases, the crystallographer may detect that the covalent structure deduced for the molecule was incorrect, or changed. For example, proteins may be cleaved or undergo post-translational modifications that were not detected prior to the crystallization.

NMR spectroscopy

A 900MHz NMR instrument with a 21.1 T magnet at HWB-NMR, Birmingham, UK.

Nuclear magnetic resonance spectroscopy, most commonly known as NMR spectroscopy, is a research technique that exploits the magnetic properties of certain atomic nuclei. It determines the physical and chemical properties of atoms or the molecules in which they are contained. It relies on the phenomenon of nuclear magnetic resonance and can provide detailed information about the structure, dynamics, reaction state, and chemical environment of molecules. The intramolecular magnetic field around an atom in a molecule changes the resonance frequency, thus giving access to details of the electronic structure of a molecule. The measurement of these properties provides a map of how the atoms are linked chemically, how close they are in space, and how rapidly they move with respect to each other.

These properties are fundamentally the same as those used in the more familiar Magnetic Resonance Imaging (MRI), but the molecular applications use a somewhat different approach, appropriate to the change of scale from millimeters (of interest to radiologists) to nano-meters (bonded atoms are typically a fraction of a nano-meter apart), a factor of a million. This change of scale requires much higher sensitivity of detection and stability for long term measurement. In contrast to MRI, structural biology studies do not directly generate an image, but rely on complex computer calculations to generate three-dimensional molecular models.

Currently most samples are examined in a solution in water, but methods are being developed to also work with solid samples. Data collection relies on placing the sample inside a powerful magnet, sending radio frequency signals through the sample, and measuring the absorption of those signals. Depending on the environment of atoms within the protein, the nuclei of individual atoms will absorb different frequencies of radio signals. Furthermore, the absorption signals of different nuclei may be perturbed by adjacent nuclei. This information can be used to determine the distance between nuclei. These distances in turn can be used to determine the overall structure of the protein.

Basic NMR techniques

When placed in a magnetic field, NMR active nuclei (such as 1H or 13C) absorb electromagnetic radiation at a frequency characteristic of the isotope. The resonant frequency, energy of the absorption, and the intensity of the signal are proportional to the strength of the magnetic field.

Acquisition of spectra

Upon excitation of the sample with a radio frequency pulse, a nuclear magnetic resonance response - a free induction decay (FID) - is obtained. It is a very weak signal, and requires sensitive radio receivers to pick up. A Fourier transform is done to extract the frequency-domain spectrum from the raw time-domain FID. A spectrum from a single FID has a low signal-to-noise ratio, but fortunately it improves readily with averaging of repeated acquisitions. Good 1H NMR spectra can be acquired with 16 repeats, which takes only minutes. However, for heavier elements than hydrogen, the relaxation time is rather long, e.g. around 8 seconds for 13C. Thus, acquisition of quantitative heavy-element spectra can be time-consuming, taking tens of minutes to hours. If the second excitation pulse is sent prematurely before the relaxation is complete, the average magnetization vector still points in a nonparallel direction, giving suboptimal absorption and emission of the pulse.

Chemical shift

A spinning charge generates a magnetic field that results in a magnetic moment proportional to the spin. In the presence of an external magnetic field, two spin states exist (for a spin 1/2 nucleus): one spin up and one spin down, where one aligns with the magnetic field and the other opposes it. The difference in energy (ΔE) between the two spin states increases as the strength of the field increases, but this difference is usually very small, leading to the requirement for strong NMR magnets (1-20 T for modern NMR instruments). Irradiation of the sample with energy corresponding to the exact spin state separation of a specific set of nuclei will cause excitation of those set of nuclei in the lower energy state to the higher energy state.

For spin 1/2 nuclei, the energy difference between the two spin states at a given magnetic field strength is proportional to their magnetic moment. However, even if all protons have the same magnetic moments, they do not give resonant signals at the same frequency values. This difference arises from the differing electronic environments of the nucleus of interest. Upon application of an external magnetic field, these electrons move in response to the field and generate local magnetic fields that oppose the much stronger applied field. This local field thus "shields" the proton from the applied magnetic field, which must therefore be increased in order to achieve resonance (absorption of rf energy). Such increments are very small, usually in parts per million (ppm). For instance, the proton peak from an aldehyde is shifted ca. 10 ppm compared to a hydrocarbon peak, since as an electron-withdrawing group, the carbonyl deshields the proton by reducing the local electron density.

Given that the location of different NMR signals is dependent on the external magnetic field strength and the reference frequency, the signals are usually reported relative to a reference signal, usually that of TMS (tetramethylsilane). Additionally, since the distribution of NMR signals is field dependent, these frequencies are divided by the spectrometer frequency. However, since we are dividing Hz by MHz, the resulting number would be too small, and thus it is multiplied by a million. This operation therefore gives a locator number called the chemical shift with units of parts per million. To detect such small frequency differences the applied magnetic field must be constant throughout the sample volume. High resolution NMR spectrometers use shims to adjust the homogeneity of the magnetic field to parts per billion (ppb) in a volume of a few cubic centimeters. In general, chemical shifts for protons are highly predictable since the shifts are primarily determined by simpler shielding effects (electron density), but the chemical shifts for many heavier nuclei are more strongly influenced by other factors including excited states ("paramagnetic" contribution to shielding tensor).

The chemical shift provides information about the structure of the molecule. The conversion of the raw data to this information is called assigning the spectrum. For example, for the 1H-NMR spectrum for ethanol (CH3CH2OH), one would expect signals at each of three specific chemical shifts: one for the CH3 group, one for the CH2 group and one for the OH group. A typical CH3 group has a shift around 1 ppm, a CH2 attached to an OH has a shift of around 4 ppm and an OH has a shift anywhere from 2–6 ppm depending on the solvent used and the amount of hydrogen bonding. While the O atom does draw electron density away from the attached H through their mutual sigma bond, the electron lone pairs on the O bathe the H in their shielding effect. Because of molecular motion at room temperature, the three methyl protons average out during the NMR experiment (which typically requires a few ms). These protons become degenerate and form a peak at the same chemical shift.

Example ₁H NMR spectrum (1-dimensional) of ethanol plotted as signal intensity vs. chemical shift.

The shape and area of peaks are indicators of chemical structure too. In the example above—the proton spectrum of ethanol—the CH3 peak has three times the area as the OH peak. Similarly, the CH2 peak would be twice the area of the OH peak but only 2/3 the area of the CH3 peak.

Software allows analysis of signal intensity of peaks, which under conditions of optimal relaxation, correlate with the number of protons of that type. The analyst must integrate the peak and not measure its height because the peaks also have width—and thus its size is dependent on its area not its height. However, it should be mentioned that the number of protons, or any other observed nucleus, is only proportional to the intensity, or the integral, of the NMR signal in the very simplest one-dimensional NMR experiments. In more elaborate experiments, for instance, experiments typically used to obtain carbon-13 NMR spectra, the integral of the signals depends on the relaxation rate of the nucleus, and its scalar and dipolar coupling constants. Very often these factors are poorly known - therefore, the integral of the NMR signal is very difficult to interpret in more complicated NMR experiments.

J-coupling

Some of the most useful information for structure determination in a one-dimensional NMR spectrum comes from J-coupling or scalar coupling (a special case of spin-spin coupling) between NMR active nuclei. This coupling arises from the interaction of different spin states through the chemical bonds of a molecule and results in the splitting of NMR signals. These splitting patterns can be complex or simple and, likewise, can be straightforwardly interpretable or deceptive. This coupling provides detailed insight into the connectivity of atoms in a molecule.

Coupling to n equivalent (spin ½) nuclei splits the signal into a n+1 multiplet with intensity ratios following Pascal's triangle. Coupling to additional spins will lead to further splittings of each component of the multiplet, e.g. coupling to two different spin ½ nuclei with significantly different coupling constants will lead to a doublet of doublets (abbreviation: dd). Note that coupling between nuclei that are chemically equivalent (that is, have the same chemical shift) has no effect on the NMR spectra and couplings between nuclei that are distant (usually more than 3 bonds apart for protons in flexible molecules) are usually too small to cause observable splittings. Long-range couplings over more than three bonds can often be observed in cyclic and aromatic compounds, leading to more complex splitting patterns.

For example, in the proton spectrum for ethanol described above, the CH3 group is split into a triplet with an intensity ratio of 1:2:1 by the two neighboring CH2 protons. Similarly, the CH2 is split into a quartet with an intensity ratio of 1:3:3:1 by the three neighboring CH3 protons. In principle, the two CH2 protons would also be split again into a doublet to form a doublet of quartets by the hydroxyl proton, but intermolecular exchange of the acidic hydroxyl proton often results in a loss of coupling information.

Coupling to any spin ½ nuclei such as phosphorus-31 or fluorine-19 works in this fashion (although the magnitudes of the coupling constants may be very different). But the splitting patterns differ from those described above for nuclei with spin greater than ½ because the spin quantum number has more than two possible values. For instance, coupling to deuterium (a spin 1 nucleus) splits the signal into a 1:1:1 triplet because the spin 1 has three spin states. Similarly, a spin 3/2 nucleus splits a signal into a 1:1:1:1 quartet and so on.

Coupling combined with the chemical shift (and the integration for protons) tells us not only about the chemical environment of the nuclei, but also the number of neighboring NMR active nuclei within the molecule. In more complex spectra with multiple peaks at similar chemical shifts or in spectra of nuclei other than hydrogen, coupling is often the only way to distinguish different nuclei.

Two-dimensional NMR spectroscopy

2D NMR is a set of NMR methods which give data plotted in a space defined by two frequency axes rather than one. Two-dimensional NMR spectra provide more information about a molecule than one-dimensional NMR spectra and are especially useful in determining the structure of a molecule, particularly for molecules that are too complicated to work with using one-dimensional NMR.

Each experiment consists of a sequence of radio frequency (RF) pulses with delay periods in between them. It is the timing, frequencies, and intensities of these pulses that distinguish different NMR experiments from one another. Almost all two-dimensional experiments have four stages: the preparation period, where a magnetization coherence is created through a set of RF pulses; the evolution period, a determined length of time during which no pulses are delivered and the nuclear spins are allowed to freely precess (rotate); the mixing period, where the coherence is manipulated by another series of pulses into a state which will give an observable signal; and the detection period, in which the free induction decay signal from the sample is observed as a function of time.

The two dimensions of a two-dimensional NMR experiment are two frequency axes representing a chemical shift. Each frequency axis is associated with one of the two time variables, which are the length of the evolution period (the evolution time) and the time elapsed during the detection period (the detection time). They are each converted from a time series to a frequency series through a two-dimensional Fourier transform. A single two-dimensional experiment is generated as a series of one-dimensional experiments, with a different specific evolution time in successive experiments, with the entire duration of the detection period recorded in each experiment.

The end result is a plot showing an intensity value for each pair of frequency variables. The intensities of the peaks in the spectrum can be represented using a third dimension. More commonly, intensity is indicated using contour lines or different colors.

Homonuclear through-bond correlation methods

In these methods, magnetization transfer occurs between nuclei of the same type, through J-coupling of nuclei connected by up to a few bonds.

In standard COSY, the preparation (p1) and mixing (p2) periods each consist of a single 90° pulse separated by the evolution time t1, and the resonance signal from the sample is read during the detection period over a range of times t2.

The first and most popular two-dimension NMR experiment is the homonuclear correlation spectroscopy (COSY) sequence, which is used to identify spins which are coupled to each other.

¹H COSY spectrum of progesterone.

The two-dimensional spectrum that results from the COSY experiment shows the frequencies for a single isotope, most commonly hydrogen (1H) along both axes. COSY spectra show two types of peaks. Diagonal peaks have the same frequency coordinate on each axis and appear along the diagonal of the plot, while cross peaks have different values for each frequency coordinate and appear off the diagonal. Diagonal peaks correspond to the peaks in a 1D-NMR experiment, while the cross peaks indicate couplings between pairs of nuclei (much as multiplet splitting indicates couplings in 1D-NMR).

Total correlation spectroscopy (TOCSY) is similar to COSY, in that cross peaks of coupled protons are observed. However, cross peaks are observed not only for nuclei which are directly coupled, but also between nuclei which are connected by a chain of couplings. This makes it useful for identifying the larger interconnected networks of spin couplings. This ability is achieved by inserting a repetitive series of pulses which cause isotropic mixing during the mixing period. Longer isotropic mixing times cause the polarization to spread out through an increasing number of bonds.

Heteronuclear through-bond correlation methods

Heteronuclear correlation spectroscopy gives signal based upon coupling between nuclei between two different types. Often the two nuclei are protons and another nucleus (called a "heteronucleus").

¹H–¹⁵N HSQC spectrum of a fragment of the protein NleG3-2. Each peak in the spectrum represents a bonded N-H pair, with its two coordinates corresponding to the chemical shifts of each of the H and N atoms. Some of the peaks are labeled with the amino acid residue that gives that signal.

Heteronuclear single-quantum correlation spectroscopy (HSQC) detects correlations between nuclei of two different types which are separated by one bond. This method gives one peak per pair of coupled nuclei, whose two coordinates are the chemical shifts of the two coupled atoms.

HSQC works by transferring magnetization from the S (sensitive) nucleus (usually the proton) to the I (insensitive) nucleus (usually the heteroatom) using the INEPT pulse sequence; this first step is done because the proton has a greater equilibrium magnetization and thus this step creates a stronger signal. The magnetization then evolves and then is transferred back to the S nucleus for observation. An extra spin echo step can then optionally be used to decouple the signal, simplifying the spectrum by collapsing multiplets to a single peak. The undesired uncoupled signals are removed by running the experiment twice with the phase of one specific pulse reversed; this reverses the signs of the desired but not the undesired peaks, so subtracting the two spectra will give only the desired peaks.

The 15N HSQC experiment is one of the most frequently recorded experiments in protein NMR. Each residue of the protein, except for proline, has an amide proton attached to a nitrogen in the peptide bond. The HSQC provides the correlation between the nitrogen and amide proton, and each amide yields a peak in the HSQC spectra.

Through-space correlation methods

These methods establish correlations between nuclei which are physically close to each other regardless of whether there is a bond between them. They use the Nuclear Overhauser effect (NOE) by which nearby atoms (within about 5 Å) undergo cross relaxation by a mechanism related to spin–lattice relaxation.

In Nuclear Overhauser effect spectroscopy (NOESY), the Nuclear Overhauser cross relaxation between nuclear spins during the mixing period is used to establish the correlations. The spectrum obtained is similar to COSY, with diagonal peaks and cross peaks, however the cross peaks connect resonances from nuclei that are spatially close rather than those that are through-bond coupled to each other. NOESY spectra also contain extra axial peaks which do not provide extra information and can be eliminated through a different experiment by reversing the phase of the first pulse.

One application of NOESY is in the study of large biomolecules such as in protein NMR, which can often be assigned using sequential walking.

Procedure

The NMR sample is prepared in a thin walled glass tube.

Protein nuclear magnetic resonance is performed on aqueous samples of highly purified protein. In contrast to X-ray crystallography, NMR spectroscopy is usually limited to proteins smaller than 35 kDa, although larger structures have been solved. NMR spectroscopy is often the only way to obtain high resolution information on partially or wholly intrinsically unstructured proteins. To facilitate the experiments, it is desirable to isotopically label the protein with 13C and 15N because the predominant naturally occurring isotope 12C is not NMR-active, whereas the nuclear quadrupole moment of the predominant naturally occurring 14N isotope prevents high resolution information to be obtained from this nitrogen isotope.

In order to analyze the nuclear magnetic resonance data, it is important to get a resonance assignment for the protein, that is to find out which chemical shift corresponds to which atom. This is typically achieved by sequential walking using information derived from several different types of NMR experiment. The exact procedure depends on whether the protein is isotopically labelled or not, since a lot of the assignment experiments depend on carbon-13 and nitrogen-15.

In order to make structure calculations a number of experimentally determined restraints have to be generated. These fall into different categories, the most widely used is distance restraints and angle restraints. For example. a crosspeak in a NOESY experiment signifies spatial proximity between the two nuclei in question. Thus, each peak can be converted into a maximum distance between the nuclei, usually between 1.8 and 6 angstroms. The intensity of a noesy peak is proportional to the distance to the minus 6th power, so the distance is determined according to intensity of the peak. The intensity-distance relationship is not exact, so usually a distance range is used.

The experimentally determined restraints can be used as input for the structure calculation process. Researchers attempt to satisfy as many of the restraints as possible, in addition to general properties of proteins such as bond lengths and angles. The algorithms convert the restraints and the general protein properties into energy terms, and thus tries to minimize the energy. The process results in an ensemble of structures that, if the data were sufficient to dictate a certain fold, will converge.

Methods and Concepts in the Life Sciences/Protein Structure Determination