Structural Biochemistry/Control of Gene Expression in Prokaryotes

DNA-Binding Proteins Distinguish Specific Sequences of DNAEdit

The method prokaryotes use most often when responding to environmental changes is altering their gene expression. Expression is when a gene is transcribed into RNA and then translated into proteins. The two types of expression are constitutive—where genes are constantly being expressed—and regulated—where specific conditions need to be met inside the cell for a gene to be expressed. This sub-section focuses on how prokaryotes go about regulating the expression of their genes.

Transcription, when DNA is converted into RNA, is the first place for controlling gene activity. Proteins interact with DNA sequences to either promote or prevent the transcription of a gene.

Keep in mind that DNA sequences are not discernible from one another in terms of having features that a regulatory system would be able to register. Therefore, when regulating gene expression, prokaryotes rely on other sequences within their genome, called regulatory sites. These regulatory sites are most often also DNA-binding protein binding sites and are close to the DNA destined for transcription in prokaryotes.

An example of one of these regulatory sites is in E. coli: when sugar lactose is introduced into the environment of the bacterium a gene for encoding the production of an enzyme, β-galactosidase, begins to be expressed. This enzyme’s function is to process lactose so that the cell can extract energy and carbon from it.

lac regulatory site

The sequence of nucleotides of this regulatory site (pictured) displays an almost completely inverted repeat. This shows that the DNA has a nearly twofold axis of symmetry, which in most regulatory sites usually correlates to symmetry in the protein that binds to the site. When studying protein-DNA interactions, symmetry is generally present.

Furthering the investigation into the expression of the lac regulatory site and the protein-DNA interactions that take place there, scientists looked at the structure of the complex formed between the DNA-binding unit that recognizes the lac site and the site itself, which is part of a larger oligonucleotide. They found that the DNA-binding unit specific to the lac regulatory site comes from the protein lac repressor. The function of lac repressor, as the name suggests, is the repression of the lactose-processing gene’s expression. This DNA-binding unit’s twofold axis of symmetry matches the symmetry of the DNA, and the unit binds as a dimer. From each monomer of the protein an α helix is inserted into the DNA’s major groove. Here amino acid side chains interact (via very site-specific Hydrogen bonding) with the exposed base pairs in such a fashion that the lac repressor can only very tightly bind to this specific site in the genome of E. coli.

The helix-turn-helix motif is common to many prokaryotic DNA-binding proteinsEdit

After discerning the structures of many prokaryotic DNA-binding proteins, a structural pattern that was observed in many proteins was a pair of α helices separated by a tight turn. These are called helix-turn-helix motifs and are made of two distinct helices: the second α helix (called the recognition helix) lies in the major groove and interacts with base pairs while the first α helix is primarily in contact with the DNA backbone.

In Prokaryotes, DNA-Binding Proteins Bind Explicitly to Regulatory Sites in OperonsEdit

Looking back at the previous example with E. coli and β-galactosidase, we can garner the common principles of how DNA-binding proteins carry out regulation. When E. coli’s environment lacks glucose—their primary source of carbon and energy—the bacteria can switch to lactose as a carbon source via the enzyme β-galactosidase. β-galactosidase hydrolyzes lactose into glucose and galactose which are then metabolized by the cell. The permease facilitates the transport of lactose across the cell membrane of the bacterium and is essential. The transacetylase is on the other hand not required for lactose metabolism but plays a role in detoxifying compounds that the permease may also be transporting. Here we can say that the expression levels of a group of enzymes that together contribute to adapting to a change in a cell’s environment change together.


An E.coli bacterium growing in an environment with a carbon source such as glucose or glycerol will have around 10 or fewer molecules of the enzyme β-galactosidase in it. This number shoots up to the thousands, however, when the bacterium is grown on lactose. The presence of lactose alone will increase the amount of β-galactosidase by a large amount by promoting the synthesis of new enzymes rather than activating a precursor.

When figuring out the mechanism of gene regulation in this particular instance, it was observed that the two other proteins galactoside permease and thiogalactoside transacetylase were synthesized alongside β-galactosidase.

An operon is made up of regulatory components and genes that encode proteinsEdit

The fact that β-galactosidase, the transacetylase, and the permease were regulated in concert indicated that a common mechanism controlled the expression of the genes encoding all three. A model called the operon model was proposed by Francois Jacob and Jacques Monod to explain this parallel regulation and other observations. The three genetic parts of the operon model are (1) a set of structural genes, (2) an operator site (a regulatory DNA sequence), and (3) a regulator gene that encodes the regulator protein.

In order to inhibit the transcription of structural genes, the regulator gene encodes a repressor protein meant for binding to the operator site. In the case of the lac operon, the repressor protein is encoded by the i gene, which binds to the o operator site in order to prevent the transcription of the z, y, and a genes (the structural genes for β-galactosidase). There is also a promoter site, p, on the operon whose function is to direct the RNA polymerase to the proper transcription initiation site. All three structural genes, when transcribed, give a single mRNA that encodes β-galactosidase, the permease, and the transacetylase. Because this mRNA encodes more than one protein, it is called a polycistronic (or polygenic) transcript.

The lactose operon

The lac repressor protein in the absence of lactose binds to the operator and blocks transcription

The lac repressor (pictured bound to DNA) is a tetramer with amino- and carboxyl-terminal domains. The amino-terminal domain is the one that binds to the DNA while the carboxyl-terminal forms a separate structure. The two sub-units pictured consolidate to form the DNA-binding unit. When lactose is absent from the environment of the bacterium, the lac repressor binds to the operator DNA snugly and swiftly (4x10^6 times as powerfully to the operator as opposed to random sites on the genome). The binding of the repressor precludes RNA polymerase from transcribing the z, y, and a genes which are downstream from the promoter site and code for the three enzymes. The dissociation constant for the complex formed by the lac repressor and the operator is around 0.1 pM and the association rate constant is a whopping 10^10 M-1s-1. This suggests that the repressor diffuses along a DNA molecule to find the operator rather than via an encounter from an aqueous medium.

When it comes to the DNA-binding preference of the lac repressor, the level of specificity is so high that it can be called a nearly unique site within the genome of E. coli. When the dimers of the amino-terminal domain bind to the operator site, the dimers of the carboxyl-terminal site are able to attach to one of two sites within 500 bp of the primary operator site that approximate the operator’s sequence. Each monomer interacts with the bound DNA’s major groove via a helix-turn-helix unit.

Ligand binding can induce structural changes in regulatory proteinsEdit

Now let’s look at how the presence of lactose changes the behavior of the repressor as well as the expression of the operon. All operons have inducers—triggers that facilitate the expression of the genes within the operon—and the inducer of the lac operon is allolactose, a molecule of galactose and glucose with an α-1,6 linkage.

In the β-galactosidase reaction, allolactose is a side product and is produced at low levels when the levels of β-galactosidase are low in the bacterium. Additionally, though not a substrate of the enzyme, isopropylthiogalactoside (IPTG) is a powerful inducer of β-galactosidase expression.

In the lac operator, the way the inducer prompts gene expression is by inhibiting the lac repressor from binding to the operator. Its method of inhibition is by binding to the lac repressor itself thus immensely reducing the affinity of the repressor to bind to the operator DNA. The inducer binds to each monomer at the center of the large domain, causing conformational changes in the DNA-binding domain of the repressor. These changes drastically reduce the DNA-binding affinity of the repressor.

The operon is a common regulatory unit in prokaryotesEdit

Numerous other gene regulation complexes within prokaryotes function analogously to the lac operon. An example of another network like this one is that which takes part in the synthesis of purine (and pyrimidine to a certain extent). These genes are repressed by the pur repressor, which is 31% identical to the lac repressor in sequence with a similar 3D structure. In this case, however, the pur repressor behaves opposite from the lac repressor: it blocks transcription by binding to a specific DNA site only when it is also bound to a small molecule called a corepressor (either guanine or hypoxanthine).

Transcription can be stimulated by proteins that contact RNA polymeraseEdit

While the previous examples of DNA-binding proteins all function by preventing the transcription of a DNA sequence until some condition in the environment is met, there are also examples of DNA-binding proteins that actually encourage transcription.

A good instance of this is the catabolite activator protein in E. coli. When the bacterium is grown in glucose it has very low amounts of catabolic enzymes whose function it is to metabolize other sugars. The genes that encode these enzymes are in fact inhibited by glucose, an effect known as catabolite repression. Glucose lowers the concentration of cAMP (cyclic AMP). When the concentration of cAMP is high, it stimulates the transcription of these catabolic enzymes made for breaking down other sugars. This is where the catabolite activator protein (CAP or CRP, cAMP receptor protein) comes into play. CAP, when bound to cAMP, will stimulate the transcription of arabinose and lactose-catabolizing genes. CAP, which binds only to a specific sequence of DNA, binds as a dimer to an inverted repeat at the position -61 relative to the start site for transcription, adjacent to where RNA polymerase binds (pictured).

This CAP-cAMP complex enhances transcription by about a factor of 50 by making the contact between RNA polymerase and CAP energetically favorable. There are multiple CAP binding sites within the E. coli genome, therefore increasing the concentration of cAMP in the bacterium’s environment will result in the formation of these CAP-cAMP complexes, thus resulting in the transcription of many genes coding for various catabolic enzymes.

Regulatory Circuits Can Result in Switching Between Patterns of Gene ExpressionEdit

In investigating gene-regulatory networks and how they function, studies of bacterial viruses—especially bacteriophage λ—have been invaluvable. Bacteriophage λ is able to develop via either a lytic or lysogenic pathway. In the lytic pathway, transcription takes place for most of the genes in the viral genome which leads to the production of numerous virus particles (~100) and the eventual lysis of the bacterium. In the lysogenic pathway, the bacterial DNA incorporates the viral genome where most of the viral genes stay unexpressed; this allows for the viral genetic material to be carried in the replicate of the bacteria. There are two essential proteins plus a set of regulatory sequences within the viral genome that are the cause for the switch between the choice of pathways.

Lambda repressor regulates its own expressionEdit

λ repressor

λ repressor is one of these key regulatory proteins which promotes the transcription of the gene that encodes the repressor when levels of the repressor are low. When levels of the repressor are high, it blocks transcription of the gene. It is also a self-regulating protein. While the λ repressor binds to many sites in the λ phage genome, the one relevant here is the right operator, which includes 3 binding sites for the dimer of the λ repressor in addition to 2 promoters within an approximately 80 base pair region. The role the first promoter plays is driving the expression of the λ repressor gene, while the other promoter is responsible for driving the expression of a variety of other viral genes. The λ repressor binds to the first operating site with the most affinity; and when it is bound to this first operating site, the chances of a protein binding to the adjacent operating site increase 25 times. When the first and second operating sites have these complexes bound to them, the dimer of the λ repressor inhibits the transcription of the adjacent gene whose purpose is to encode the protein Cro (controller of repressor and others). The repressor dimer at the second operating site can interact with RNA polymerase so as to stimulate the transcription of the promoter which controls the transcription of the gene encoding the λ repressor. This is how the λ repressor facilitates its own production. λ repressor fusions can be used to study protein-protein interactions in E. coli. There are two different domains in λ repressor: the N-terminal (DNA binding activity) and the C-terminal domain (dimerization). In order to have an active repressor fusion, the C-terminal domain should be replaced with a Heterodimers domain and form a dimer or higher order oligomer. However, inactive repressor fusions cannot attach to the DNA sequences and affect the expression of phage or reporter.[1]

A circuit based on lambda repressor and Cro form a genetic switchEdit

We can see in the above picture how the λ repressor blocks production of Cro by binding to the first operating site with the most affinity. Cro meanwhile blocks the production of the λ repressor by binding to the third operating site with the most affinity. This entire circuit is the deciding factor as to whether the lytic or lysogenic pathway will be followed: if λ repressor is high and Cro is low, the lysogenic path will be chosen; if Cro is high and the λ repressor is low, the lytic path will be chosen.

Many prokaryotic cells release chemical signals that regulate gene expression in other cellsEdit

Some prokaryotes are also known to undergo a process where they release chemicals called autoinducers into their medium (quorum sensing). These autoinducers, which are most of the time acyl homoserine lactones, are taken up by the surrounding cells. When the levels of these autoinducers reach a certain point, receptor proteins bind to them and activate the expression of several genes, including those that promote the synthesis of more autoinducers. This is a way for prokaryotes to interact with one another chemically to change their gene-expression patterns depending on how many other surrounding cells there are in their medium. Communities of prokaryotes that carry these mechanisms of quorum sensing out are collectively called a biofilm.

Gene Expression Can Be Controlled at Posttranscriptional LevelsEdit

Though most of gene expression regulation happens at the initiation of transcription, other steps of transcription are also possible targets for regulation.

Tryptophan OperonEdit

Exploring the genes of the tryptophan operon (abbreviated as trp operon) in order to study the regulation of tryptophan synthesis shows two types of mutants. One type of mutant involves structural gene mutations and the other a regulatory mutant. The mutants that involve structural gene mutations are auxotrophic for tryptophan and need tryptophan to growth. To convert the precursor molecule chorismate to tryptophan, the trpE, trpD, trpC, trpB, and trpA genes codes for a polycistronic message and the mRNA will be translated to the enzyme that carries out the conversion.


The second type of mutants is able to constitutively synthesize the enzymes necessary for the synthesis of tryptophan. The trpR gene codes for the tryptophan repressor. The gene mapped in another quadrant of the E. coll chromosome compared to the trp operon. The trpR gene cannot regulate the synthesis of tryptophan efficiently. Studies on the dimeric trp repressor protein show that it does not function alone. The repressor must bind the last product of that metabolic pathway in order to regulate the synthesis of tryptophan. Thus, tryptophan is a corepressor for its own biosynthesis. This process is called feedback repression at the transcriptional level.

When the concentration of tryptophan is high enough, then the repressor binds to tryptophan to make a repressor-tryptophan complex. This complex will attach to the operator region of the trp operon and prevents RNA polymerase to bind and initiate transcription of the structural genes. Also, when the concentration of tryptophan is low in the cell, due to lack of tryptophan-complex RNA polymerase is able to bind to the gene and transcribe the structural genes. Therefore, tryptophan will be biosynthesized.[2]

Attenuation is a prokaryotic mechanism for regulating transcription through the modulation of nascent RNA secondary structureEdit

While studying the tryptophan operon, Charles Yanofsky discovered another means of transcription regulation. The trp operon encodes 5 enzymes that convert chorismate into tryptophan, and upon examining the 5’ end of the trp mRNA he found there was a leader sequence consisting of 162 nucleotides that came before the initiation codon of the first enzyme. His next observation was that only the first 130 nucleotides were produced as a transcript when the levels of tryptophan were high, but when levels were low a 7000-nucleotide trp mRNA which included the entire leader sequence was produced. This mode of regulation is called attenuation, where transcription is cut off before any mRNA coding for the enzymes is produced.

Attenuation depends on the mRNA’s 5’ end features. The first part of the sequence codes for a leader peptide of 14 amino acids. The attenuator comes after the open reading frame for this peptide—it is an RNA region capable of forming a few alternate structures. Because transcription and translation are very closely coupled in bacteria, the translation of say the trp mRNA begins very soon after the synthesizing of the ribosome-binding site.

The structure of mRNA is altered by a ribosome, which is stalled by the absence of an animoacyl-tRNA necessary for the translation of the leader mRNA. This allows RNA polymerase to transcribe the operon past the attenuator site


  1. Leonardo Mariño-Ramírez, Lisa Campbell, and James C. Hu. Screening Peptide/Protein Libraries Fused to the λ Repressor DNA-Binding Domain in E. coli Cells. 2003; 205: 235–250.
  2. Arkady B. Khodursky, Brian J. Peter, Nicholas R. Cozzarelli, David Botstein. DNA microarray analysis of gene expression in response to physiological and genetic changes that affect tryptophan metabolism in Escherichia coli. 2000 October 24; 97(22): 12170–12175. Published online 2000 October 10.