Discover
Cellular and Molecular Biology for Research
Cellular and Molecular Biology for Research
Author: Ahmadreza Gharaeian
Subscribed: 39Played: 191Subscribe
Share
© Ahmadreza Gharaeian
Description
Cellular and Molecular Biology for Research is the podcast where complex textbooks stop gathering dust and start making sense. Each episode breaks down the dense chapters of cellular and molecular biology—DNA, signaling pathways, protein folding, experimental techniques—into clear explanations for students, early-career researchers, or anyone who wants to actually understand the science instead of just memorizing it. Think of it as your study buddy who reads the heavy stuff, translates the jargon, and hands you the key concepts (with a little less pain and a lot more clarity).
42 Episodes
Reverse
Several approaches are available for identifying genes within a large, unsequenced DNA region. One method is the exon trap, which employs a specialized vector to selectively clone exons. Another involves using methylation-sensitive restriction enzymes to locate CpG islands—DNA regions containing unmethylated CpG sequences. Prior to the genomics era, geneticists mapped the Huntington disease gene (HD) to a region near the end of chromosome 4, subsequently using an exon trap to identify the gene itself.Advancements in automated DNA sequencing methods have enabled molecular biologists to determine the base sequences of various organisms, from simple phages and bacteria to yeast, plants, animals, and humans. In the Human Genome Project, much of the mapping work utilized yeast artificial chromosomes (YACs), which are vectors containing a yeast origin of replication, a centromere, and two telomeres. These vectors can accommodate foreign DNA up to 1 million base pairs long, which replicates alongside the YAC. However, due to their superior stability and ease of use, bacterial artificial chromosomes (BACs) became the preferred tool for sequencing. BACs, derived from the F plasmid of E. coli, can accept DNA inserts up to approximately 300 kilobases, with an average insert size of about 150 kilobases.Mapping large genomes, such as the human genome, requires a set of landmarks (markers) to determine the positions of genes. While genes themselves can serve as markers, most markers consist of anonymous DNA segments like RFLPs, VNTRs, STSs (including ESTs), and microsatellites. Restriction fragment length polymorphisms (RFLPs) are variations in the lengths of DNA fragments produced by cutting DNA from different individuals with a restriction enzyme, often caused by the presence or absence of specific restriction sites.
Transposable elements, also known as transposons, are DNA segments capable of moving from one location to another within the genome. Some transposable elements replicate during the process, leaving one copy in the original position and inserting a new copy at a different site, while others move without replication, vacating the original site entirely. Bacterial transposons can be categorized as follows: (1) insertion sequences, such as IS1, which consist solely of the genes required for transposition and are flanked by inverted terminal repeats; and (2) transposons like Tn3, which resemble insertion sequences but include at least one additional gene, often conferring antibiotic resistance.Eukaryotic transposons exhibit diverse replication strategies. DNA transposons, such as Ds and Ac in maize or the P elements in Drosophila, function similarly to bacterial DNA transposons like Tn3.The immunoglobulin genes in mammals undergo rearrangement through a mechanism analogous to transposition. Vertebrate immune systems generate immense diversity in immunoglobulin production by assembling genes from two or three components selected from a heterogeneous pool. This process, called V(D)J recombination, relies on recombination signal sequences (RSSs) that include a heptamer and a nonamer separated by either 12-bp or 23-bp spacers. Recombination occurs exclusively between a 12 signal and a 23 signal, ensuring the incorporation of only one of each type of coding region into the assembled gene. Key players in human V(D)J recombination are RAG1 and RAG2, which create single-strand nicks in DNA adjacent to a 12 or 23 signal. This triggers a transesterification reaction where the newly formed 3'-hydroxyl group attacks the opposite strand, leading to a break and forming a hairpin at the end of the coding segment.
Homologous recombination is vital for life. In eukaryotic meiosis, it ensures proper separation of homologous chromosomes by locking them together and promotes genetic diversity in offspring by scrambling parental genes. In all life forms, it plays a crucial role in managing DNA damage. In E. coli, homologous recombination via the RecBCD pathway starts with the invasion of duplex DNA by single-stranded DNA from another duplex that has undergone a double-stranded break. This process begins with RecBCD's nuclease and helicase activities, which generate a free end by preferentially nicking DNA at Chi sites. The invading strand is then coated with RecA and SSB. RecA facilitates the pairing of the invading strand with its complementary homologous DNA, forming a D-loop, while SSB enhances recombination by melting secondary structures and preventing RecA from trapping such structures, which could inhibit subsequent strand exchange. Following this, RecBCD likely nicks the D-loop strand, creating a branched intermediate known as a Holliday junction. The RuvA–RuvB helicase catalyzes branch migration, moving the crossover of the Holliday junction to a favorable resolution site. Finally, RuvC resolves the Holliday junction by nicking two of its strands, producing either noncrossover recombinants with heteroduplex patches or two crossover recombinant DNAs.Meiotic recombination in yeast begins with double-stranded breaks (DSBs) created by two Spo11 molecules. These molecules work together to cleave both DNA strands at closely spaced sites through transesterification reactions involving active site tyrosines. This reaction forms covalent bonds between Spo11 and the newly created DSBs. Spo11 is subsequently released.
Primer synthesis in E. coli involves the primosome, which consists of the DNA helicase DnaB and the primase DnaG. The assembly of the primosome at the origin of replication, oriC, proceeds as follows: DnaA binds to oriC at specific sites known as dnaA boxes and collaborates with RNA polymerase and HU protein to melt a DNA region adjacent to the leftmost dnaA box. Subsequently, DnaB associates with the open complex and promotes the binding of the primase to complete the primosome. The primosome remains attached to the replisome, repeatedly initiating Okazaki fragment synthesis on the lagging strand. Additionally, DnaB exhibits helicase activity, unwinding the DNA as the replisome advances.In the case of the SV40 origin of replication, it is located adjacent to the viral transcription control region. Replication initiation relies on the viral large T antigen, which binds within the 64-bp minimal ori at two adjacent sites. This antigen also possesses helicase activity, creating a replication bubble within the minimal ori. Priming is performed by a primase associated with the host DNA polymerase α.Yeast origins of replication are found within autonomously replicating sequences (ARSs), which consist of four key regions: A, B1, B2, and B3. Region A, a 15-bp sequence, contains an 11-bp consensus sequence that is highly conserved across ARSs. Region B3 may contribute to a critical DNA bend within ARS1.The pol III holoenzyme synthesizes DNA at a rate of approximately 730 nucleotides per second in vitro, slightly slower than the nearly 1000 nucleotides per second observed in vivo. This enzyme is highly processive both in vitro and in vivo. The pol III core (αε or αεθ) alone lacks processivity and can only replicate short DNA segments before dissociating from the template. However, when combined with the β-subunit, the core achieves processive replication at a rate approaching 1000 nucleotides per second. The β-subunit forms a dimer that takes on a ring-like structure, encircling the DNA.
Several principles govern DNA replication across most organisms: (1) Double-stranded DNA replicates in a semiconservative manner, where the parental strands separate and serve as templates for the synthesis of new, complementary strands. (2) DNA replication in E. coli and other organisms is at least semidiscontinuous. One strand, often considered to replicate continuously in the direction of the replication fork's movement, may actually replicate discontinuously. The other strand replicates discontinuously, forming 1–2 kb Okazaki fragments in the opposite direction, allowing both strands to be synthesized in the 5'→3' direction. (3) DNA replication initiation requires a primer. In E. coli, Okazaki fragments are initiated with RNA primers that are 10–12 nucleotides long. (4) Most bacterial and eukaryotic DNAs replicate bidirectionally, though some, like ColE1, replicate unidirectionally.Circular DNAs can replicate via the rolling circle mechanism, where one strand of the double-stranded DNA is nicked, and the 3'-end is extended using the intact strand as a template. This process displaces the 5'-end, and in phage λ, the displaced strand serves as a template for discontinuous, lagging strand synthesis.Pol I is a highly versatile enzyme with three distinct activities: DNA polymerase, 3'→5' exonuclease, and 5'→3' exonuclease. The first two activities reside on a large domain of the enzyme, while the third is on a smaller, separate domain. The large domain, known as the Klenow fragment, can be isolated through mild protease treatment, yielding two protein fragments with all three activities intact. The structure of the Klenow fragment includes a wide cleft for DNA binding, with the polymerase active site located far from the 3'→5' exonuclease active site.Among the three DNA polymerases in E. coli—Pol I, Pol II, and Pol III—only Pol III is essential for replication.
X-ray crystallography studies on bacterial ribosomes with and without tRNAs have revealed that tRNAs occupy the cleft between the two subunits. They interact with the 30S subunit through their anticodon ends and with the 50S subunit through their acceptor stems. The binding sites for tRNAs primarily consist of rRNA. The anticodons of tRNAs in the A and P sites come into close proximity, allowing base-pairing with adjacent codons in the mRNA bound to the 30S subunit, as the mRNA bends 45 degrees between the two codons. The acceptor stems of tRNAs in the A and P sites also approach each other closely—within just 5 Å—within the peptidyl transferase pocket of the 50S subunit, where twelve contacts between ribosomal subunits are visible.The crystal structure of the E. coli ribosome reveals two conformations that differ due to rigid body motions of ribosomal domains relative to each other. Specifically, the head of the 30S particle rotates by 6 degrees and by 12 degrees when compared to the T. thermophilus ribosome. This rotation is likely part of the ratchet-like motion of the ribosome during translocation.The E. coli 30S subunit comprises a 16S rRNA and 21 proteins (S1–S21), while the 50S subunit contains a 5S rRNA, a 23S rRNA, and 34 proteins (L1–L34). Eukaryotic cytoplasmic ribosomes are larger and include more RNAs and proteins than their prokaryotic counterparts. Sequence studies of 16S rRNA proposed its secondary structure (intramolecular base pairing), which has been confirmed by X-ray crystallography studies. These studies reveal a 30S subunit with extensively base-paired 16S rRNA, whose shape essentially defines the particle's overall structure. Additionally, X-ray crystallography studies have identified the locations of most 30S ribosomal proteins.The 30S ribosomal subunit serves two primary roles. It facilitates accurate decoding of mRNA and contributes to the overall function of the ribosome during translation.
Messenger RNAs are read in the 5' to 3' direction, which is the same direction in which are synthesized. Proteins are synthesized from the amino terminus to the carboxyl terminus, meaning the amino-terminal amino acid is added first. The genetic code consists of three-base sequences called codons in mRNA, which instruct the ribosome to incorporate specific amino acids into a polypeptide. The code nonoverlapping, meaning each base is part of only one codon, and it lacks gaps or commas, with every base in the coding region of an mRNA being part of a codon. There are 64 codons in total, three of which are stop signals, while the remaining codons encode amino acids, making the code highly degenerate. The degeneracy of the genetic code is partially managed by isoaccepting tRNA species that bind the same amino acid but recognize different codons. Additionally, wobble pairing allows the third base of a codon to deviate slightly from its normal position, forming non-Watson–Crick base pairs with the anticodon. This enables a single aminoacyl-tRNA to pair with multiple codons. Wobble pairs include G–U (or I–U) and I–A. The genetic code is not strictly universal. In certain eukaryotic nuclei, mitochondria, and at least one bacterium, codons that serve as termination signals in the standard genetic code can instead encode amino acids such as tryptophan and glutamine. In some mitochondrial genomes, the meaning of codons is altered, switching from one amino acid to another. Despite these deviations, the altered codes remain closely related to the standard genetic code from which they likely evolved. Elongation occurs in three steps: (1) EF-Tu, bound with GTP, delivers an aminoacyl-tRNA to the ribosomal A site. (2) Peptidyl transferase forms a peptide bond between the peptide in the P site and the newly arrived aminoacyl-tRNA in the A site, extending the peptide by one amino acid and shifting it to the A site. (3) EF-G, in conjunction with GTP, translocates the growing peptide.
Two critical events precede protein synthesis. First, aminoacyl-tRNA synthetases attach amino acids to their respective tRNAs with high specificity through a two-step reaction that begins with the activation of the amino acid using AMP, derived from ATP. Second, ribosomes must dissociate into their subunits at the conclusion of each translation cycle. In bacteria, this dissociation is actively facilitated by RRF and EF-G, while IF3 binds to the free 30S subunit, preventing its reassociation with the 50S subunit to form a complete ribosome.The initiation codon in prokaryotes is typically AUG but can also be GUG or, more rarely, UUG. The initiating aminoacyl-tRNA is N-formyl-methionyl-tRNAfMet. N-formyl-methionine (fMet) is the first amino acid incorporated into a polypeptide chain, although it is often removed during protein maturation. The 30S initiation complex is formed by the association of a free 30S ribosomal subunit with mRNA and fMet-tRNAfMet. This binding depends on base pairing between the Shine-Dalgarno sequence, located just upstream of the initiation codon in mRNA, and a complementary sequence at the 3'-end of the 16S rRNA. IF3 mediates this interaction with the assistance of IF1 and IF2, which are all bound to the 30S subunit at this stage.IF2 plays a central role in promoting the binding of fMet-tRNAfMet to the 30S initiation complex, while the other two initiation factors provide essential support. GTP is required for IF2 binding under physiological IF2 concentrations, though it is not hydrolyzed during this process. The complete 30S initiation complex consists of one 30S ribosomal subunit, one molecule each of mRNA, fMet-tRNAfMet, GTP, IF1, IF2, and IF3. GTP hydrolysis occurs after the 50S subunit joins the 30S complex to form the functional 70S initiation complex.
Ribosomal RNAs are synthesized in the nucleoli of eukaryotic cells as precursors that require processing to yield mature rRNAs. The sequence of RNAs in the precursor is universally 18S, 5.8S, and 28S across all eukaryotes, although the precise sizes of the mature rRNAs differ among species. In human cells, the precursor is 45S, which undergoes a processing scheme that produces 41S, 32S, and 20S intermediates, with snoRNAs playing crucial roles in these steps. Extra nucleotides are removed from the 5'-ends of pre-tRNAs in a single step through endonucleolytic cleavage catalyzed by RNase P. Both bacterial and eukaryotic RNase P enzymes have a catalytic RNA subunit called M1 RNA. In E. coli, RNase II and polynucleotide phosphorylase cooperate to remove most of the additional nucleotides at the 3'-end of a tRNA precursor but halt at the 12-base stage. RNases PH and T are primarily responsible for removing the last two nucleotides. In eukaryotes, a single enzyme, tRNA 3'-processing endoribonuclease (3'-tRNase), performs the processing of the 3'-end of a pre-tRNA.Trypanosome mRNAs are generated through trans-splicing, which links a short leader exon with one of many independent coding exons. In trypanosomatid mitochondria, incomplete mRNAs require editing before translation. Editing occurs in the 3'→5' direction through sequential actions of one or more guide RNAs (gRNAs). These gRNAs bind to unedited mRNA regions, providing A's and G's as templates for inserting missing U's or deleting extra U's.In higher eukaryotes, including fruit flies and mammals, some adenosines in mRNAs must be post-transcriptionally deaminated to inosine for correct translation. This type of RNA editing is performed by enzymes called adenosine deaminases acting on RNAs (ADARs). Additionally, certain cytidines must be deaminated to uridine for accurate mRNA coding. Post-transcriptional gene regulation often involves such modifications to ensure proper gene expression.
Capping occurs in several steps: initially, RNA triphosphatase removes the terminal phosphate from pre-mRNA. Subsequently, guanylyl transferase adds the capping GMP derived from GTP, followed by two methyl transferases that methylate the N7 position of the capping guanosine and the 2'-O-methyl group of the penultimate nucleotide. These processes take place early in transcription, before the RNA chain exceeds 30 nucleotides in length. The cap plays a crucial role in ensuring proper splicing of some pre-mRNAs, facilitating the transport of mature mRNAs out of the nucleus, protecting mRNA from degradation, and enhancing its translatability. Most eukaryotic mRNAs and their precursors possess a poly(A) tail approximately 250 nucleotides long at their 3'-ends, added post-transcriptionally by poly(A) polymerase. The poly(A) tail increases both the stability and translatability of the mRNA, with the relative importance of these effects differing across systems. Transcription of eukaryotic genes beyond the polyadenylation site, after which the transcript is cleaved and polyadenylated at the newly formed 3'-end. An efficient mammalian polyadenylation signal includes an AAUAAA motif about 20 nucleotides upstream of the polyadenylation site, followed 23–24 base pairs later by a GU-rich sequence and then a U-rich motif. Variations in these sequences influence polyadenylation efficiency, with plant signals allowing more flexibility around the AAUAAA motif than animal signals, and yeast signals rarely containing the AAUAAA motif. Polyadenylation involves both cleavage of the pre-mRNA and the addition of the poly(A) tail at the cleavage site. The cleavage process requires multiple proteins, including CPSF, CstF, CF I, CF II, poly(A) polymerase, and the CTD of the largest subunit of RNA polymerase II. Among these, CPSF-73 is responsible for cleaving the pre-mRNA.
Nuclear mRNA precursors undergo splicing through a lariat-shaped or branched intermediate. In addition to the consensus sequences at the 5′ and 3′ ends of nuclear introns, branchpoint consensus sequences are also present. In yeast, this sequence is almost invariant as UACUAAC, whereas in higher eukaryotes, the consensus sequence is more variable, represented as YNCURAC. In all cases, the branched nucleotide corresponds to the final A in the sequence. The yeast branchpoint sequence also determines which downstream AG serves as the 3′ splice site.Splicing occurs on a complex structure known as the spliceosome. Yeast and mammalian spliceosomes have sedimentation coefficients of approximately 40S and 60S, respectively. Genetic studies have revealed that base pairing between U1 snRNA and the 5′ splice site an mRNA precursor is necessary but not sufficient for splicing. The U6 snRNP also forms a base-pairing association with the 5′ end of the intron, which begins before the formation of the lariat intermediate but may alter its nature after this initial step. This interaction between U6 and the splicing substrate is critical for the splicing process. Furthermore, U6 interacts with U2 during splicing.The U2 snRNA base-pairs with the conserved sequence at the splicing branchpoint, an interaction essential for splicing. Additionally, U2 forms significant base pairs with U6 to create a region referred to as helix I, which plays a role in aligning these snRNPs for the splicing process. The U4 snRNA base-pairs with U6, contributing to the splicing mechanism.
Eukaryotic DNA associates with basic protein molecules called histones to form nucleosomes. Each nucleosome consists of four pairs of histones (H2A, H2B, H3, and H4) arranged in a wedge-shaped disc, around which 146 base pairs (bp) of DNA are wrapped. Histone H1, which is not part of the core nucleosome, is more easily removed from chromatin than the core histones. In the second level of chromatin folding, both in vitro and presumably in vivo, a string of nucleosomes forms a 30-nanometer (nm) fiber. Studies indicate that this fiber exists in at least two forms within the nucleus: inactive chromatin, characterized by a high nucleosome repeat length (approximately 197 bp), tends to adopt a solenoid folding structure and interacts with histone H1, which stabilizes its structure. Conversely, active chromatin, with a lower nucleosome repeat length (around 167 bp), folds according to the two-start double helical model.The third level of chromatin condensation involves the formation of radial loop structures in eukaryotic chromosomes. The 30-nm fiber forms loops ranging from 35 to 85 kilobases () in length, anchored to the chromosome's central matrix.Core histones (H2A, H2B, H3, and H4) assemble nucleosome cores on naked DNA. Transcription of a class II gene in reconstituted chromatin, with an average of one nucleosome core per 200 bp of DNA, shows approximately 75% repression compared to naked DNA. The remaining 25% activity is attributed to promoter sites not covered by nucleosome cores. Histone H1 further represses template activity beyond the core nucleosomes. This repression can be mitigated by transcription factors, some of which, like Sp1 and GAL4, act as both antirepressors (preventing repression by histone H1) and transcription activators. Others, such as the GAGA factor, function solely as antirepressors, likely competing with histone H1 for binding.
Eukaryotic activators consist of at least two domains: a DNA-binding domain and a transcription-activating domain. DNA-binding domains include motifs such as zinc modules, homeodomains, bZIP, or bHLH motifs. Transcription-activating domains can be acidic, glutamine-rich, or proline-rich. Zinc fingers are characterized by an antiparallel β-sheet followed by an α-helix. The β-sheet contains two cysteines, and the α-helix contains two histidines, which coordinate with a zinc ion to form the finger-shaped structure. This coordination facilitates specific recognition of the DNA target within the major groove.The DNA-binding motif of the GAL4 protein includes six cysteines that coordinate two zinc ions in a bimetal thiolate cluster. This motif features a short α-helix that extends into the DNA major groove, forming specific interactions. Additionally, the GAL4 monomer contains an α-helical dimerization motif that forms a parallel coiled coil with the α-helix of another GAL4 monomer. Type I nuclear receptors are located in the cytoplasm, bound to other proteins. Upon binding their hormone ligands, these receptors release their cytoplasmic partners, translocate to the nucleus, bind to enhancers, and function as activators. A representative example is the glucocorticoid receptor, which contains a DNA-binding domain with two zinc modules. One module provides DNA-binding residues in a recognition α-helix, while the other facilitates protein-protein interactions for dimer formation. These zinc modules use four cysteine residues to complex the zinc ion, unlike classical zinc fingers, which use two cysteines and two histidines.Homeodomains in eukaryotic activators contain a DNA-binding motif that operates similarly to the helix-turn-helix motifs in prokaryotes, where a recognition helix fits into the DNA major groove.
Transcription factors bind to class II promoters in vitro in the following sequence: (1) TFIID, with assistance from TFIIA, attaches to the TATA box. (2) TFIIB binds subsequently. (3) TFIIF facilitates the binding of RNA polymerase II. The remaining factors bind in this order:IIE and TFIIH, creating the DABPolFEH preinitiation complex. Notably, TFIIA's involvement appears to be optional in vitro.TFIID is composed of a TATA-box-binding protein (TBP) and 13 additional polypeptides referred to as TBP-associated factors (TAFs). The TATA-box-binding domain of TBP is located within its C-terminal 180 amino acid fragment. The interaction between TBP and the TATA box occurs within the DNA minor groove. The saddle-like shape of TBP aligns with the DNA, and the underside of the "saddle" forces the minor groove open, bending the TATA box by approximately 80 degrees. TBP is essential for the transcription of most genes across all three classes, not limited to class II genes.Many TAFs are evolutionarily conserved across eukaryotes and serve multiple roles, including interacting with core promoter elements and gene-specific transcription factors. TAF1 and TAF2 enable TFIID to bind to initiator elements and downstream promoter elements (DPEs), allowing TBP to bind to certain TATA-less promoters. TAF1 and TAF4 facilitate TFIID's interaction with Sp1 bound to GC boxes upstream of the transcription start site, ensuring TBP binding to TATA-less promoters containing GC boxes. Different TAF combinations are required to respond to various transcription activators, particularly in higher eukaryotes. Additionally, TAF1 exhibits enzymatic activity as both a histone acetyltransferase and a protein kinase. However, TFIID is not universally required in higher eukaryotes. For instance, some Drosophila promoters require an alternative factor, TRF1, while others depend on a TBP-free TAF complex.
Eukaryotic nuclei house three distinct RNA polymerases, which can be separated using ion-exchange chromatography. RNA polymerase I resides in the nucleolus, while other two are located in the nucleoplasm. Each of these polymerases performs specific transcriptional roles. Polymerase I synthesizes a large precursor to the major rRNAs (5.8S, 18S, and 28S in vertebrates). Polymerase II generates hnRNAs, precursors to mRNAs, as well as miRNA precursors and most small nuclear RNAs (snRNAs). Polymerase III is responsible for producing precursors of 5S rRNA, tRNAs, and various other small cellular and viral RNAs.The subunit structures of the three nuclear polymerases have been analyzed in several eukaryotes, revealing multiple subunits, including two large ones exceeding 100 kD in molecular mass. Common subunits appear in all three polymerases across eukaryotes. In yeast, the genes encoding all 12 RNA polymerase II subunits have been sequenced and subjected to mutation analysis. Among these subunits, three resemble the core subunits of bacterial RNA polymerases in structure and function, five are shared by all three nuclear polymerases, two are dispensable under normal conditions, and two do not fit into these categories.Subunit IIa, the primary product of the yeast RPB1 gene, can be converted to IIb in vitro through the proteolytic removal of the carboxyl-terminal domain (CTD), which consists of repeated heptapeptides. In vivo, subunit IIa is phosphorylated at two serines within the CTD heptad to form IIo. The enzyme containing the IIa subunit (polymerase IIA) binds to the promoter, while the enzyme with the IIo subunit (polymerase IIO) participates in transcript elongation.The structure of yeast pol II D4/7 reveals a deep cleft capable of accommodating a DNA template. The catalytic activity and functional mechanisms of these polymerases underscore their critical roles in eukaryotic transcription.
The repressors of the λ-like phages possess recognition helices that fit sideways into the major groove of the operator DNA. Specific amino acids on the DNA-facing side of the recognition helix establish precise contacts with bases in the operator, and these interactions determine the specificity of the protein-DNA binding. Altering these amino acids can modify the specificity of the repressor. Both the λ repressor and the Cro protein exhibit affinity for the same operators, but their microspecificities for OR1 or OR3 are defined by interactions between distinct amino acids in the recognition helices of the two proteins and the base pairs in the respective operators. The cocrystal structure of a λ repressor fragment bound to an operator fragment provides detailed insight into the protein-DNA interactions. The most critical contacts occur in the major groove, where amino acids on the recognition helix, along with other amino acids, form hydrogen bonds with the edges of DNA bases and the DNA backbone. Some of these hydrogen bonds are reinforced by hydrogen bond networks involving two amino acids and multiple sites on the DNA. The structural data derived from the cocrystal closely align with prior biochemical and genetic findings.X-ray crystallography of a phage 434 repressor fragment/operator-fragment complex reveals probable hydrogen bonding between amino acid residues in the recognition helix and base pairs in the repressor. It also indicates a potential van der Waals interaction between an amino acid in the recognition helix and a base in the operator. The DNA in the deviates significantly from its typical regular shape, bending slightly to facilitate the necessary base/amino acid contacts. Additionally, the central region of the helix, the two half-sites, is wound more tightly, while the outer regions are wound more loosely than usual. These structural deviations are supported by the base sequence of the operator.
Bacteria undergo significant shifts in transcription patterns during various processes, such as phage infection or sporulation, and have evolved multiple mechanisms to facilitate these changes. For instance, the transcription of phage SPO1 genes in infected B. subtilis cells follows a temporal sequence, where early genes are transcribed first, followed by middle genes, and finally late genes. This transition is regulated by phage-encoded sigma factors that associate with the host's core RNA polymerase and alter its specificity from early to middle to late genes. The host sigma factor is specific to the phage early genes, while the phage gp28 protein changes the specificity to middle genes, and gp33 and gp34 proteins direct specificity to late genes.When B. subtilis undergoes sporulation, an entirely new set of sporulation-specific genes is activated, while many vegetative genes are turned off. This switch primarily occurs at the transcriptional level and is mediated by several new sigma factors that displace the vegetative sigma factor from the core RNA polymerase, redirecting transcription to sporulation-specific genes. Each sigma factor recognizes its own preferred promoter sequence.Certain prokaryotic genes must be transcribed under conditions where two different sigma factors are active. These genes are equipped with dual promoters, each recognized by one of the sigma factors, ensuring their expression regardless of which factor is present and enabling differential regulation under varying conditions. For example, in E. coli, the heat shock response and responses to low nitrogen and starvation stress are regulated by alternative sigma factors—sigma32 (σH), sigma54 (σN), and sigma38 (σS)—which replace the primary sigma factor sigma70 (σA) and direct RNA polymerase to alternative promoters. Additionally, many sigma factors are regulated by anti-sigma factors that bind to specific sigma factors and inhibit their interaction with the core RNA polymerase. Some of these anti-sigma factors are further regulated by additional mechanisms.
Lactose metabolism in E. coli is facilitated by two essential proteins, β-galactosidase and galactoside permease. The genes encoding these proteins, along with another enzyme, are organized into a cluster and transcribed together from a single promoter, producing a polycistronic mRNA. These functionally related genes are therefore regulated collectively. The lac operon is controlled through both positive and negative regulatory mechanisms. Negative regulation occurs as follows: the operon remains inactive when the repressor binds to the operator, blocking RNA polymerase from attaching to the promoter and transcribing the three lac genes. When glucose is depleted and lactose becomes available, the few existing molecules of lac operon enzymes convert lactose into allolactose, which functions as an inducer. Allolactose binds to the repressor, inducing a conformational change that prompts its dissociation from the operator. Once the repressor is removed, RNA polymerase can proceed to transcribe the three lac genes. Genetic and biochemical studies have identified the two primary components of negative control in the lac operon: the operator and the repressor. Additionally, DNA sequencing has revealed two auxiliary lac operators, one upstream and one downstream of the main operator, all three of which are necessary for optimal repression.Positive regulation of the lac operon, as well as other inducible operons encoding sugar-metabolizing enzymes, is mediated by the catabolite activator protein (CAP) in conjunction with cyclic AMP (cAMP). The CAP-cAMP complex enhances transcription. However, glucose suppresses cAMP levels, thereby inhibiting positive regulation. As a result, the lac operon becomes active only when glucose levels are low, necessitating the metabolism of an alternative energy source. The CAP-cAMP complex facilitates this activation.
The catalytic agent in the transcription process is RNA polymerase. In E. coli, this enzyme consists of a core, which houses the fundamental transcription machinery, and a sigma factor (σ-factor), which guides the core to transcribe specific genes. The σ-factor facilitates the initiation of transcription by enabling the RNA polymerase holoenzyme to bind tightly to a promoter. This σ-dependent binding necessitates the localized melting of 10–17 base pairs of DNA near the transcription start site, forming an open promoter complex. By directing the holoenzyme to bind exclusively to certain promoters, the σ-factor determines which genes will be transcribed. Transcription initiation proceeds until 9 or 10 nucleotides are incorporated into the RNA, at which point the core transitions to an elongation-specific conformation, departs from the promoter, and continues with elongation. The σ-factor is generally released from the core polymerase, though not always immediately after promoter clearance, often exiting stochastically during elongation. The σ-factor can be reused by other core polymerases. Rifampicin sensitivity or resistance is governed by the core, not the σ-factor. E. coli RNA polymerase achieves abortive transcription through a mechanism called scrunching, in which downstream DNA is drawn into the polymerase without the polymerase physically moving, while retaining its grip on the promoter DNA. The scrunched DNA may store sufficient energy to enable the polymerase to dissociate from the promoter and initiate productive transcription. Prokaryotic promoters contain two key regions located approximately 10 and 35 base pairs upstream of the transcription start site. In E. coli, these regions have consensus sequences of TATAAT and TTGACA, respectively. Generally, the closer a promoter's sequences match these consensus sequences, the stronger the promoter will be. Some exceptionally strong promoters also feature an additional element, known as an UP element, upstream of the core promoter.
Methods for purifying proteins and nucleic acids are fundamental in molecular biology. DNA, RNA, and proteins of varying sizes can be effectively separated using gel electrophoresis. Agarose is the most commonly used gel for nucleic acid electrophoresis, while polyacrylamide is typically employed for protein electrophoresis. Sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) separates polypeptides based on their sizes. For higher resolution, two-dimensional gel electrophoresis is utilized, combining isoelectric focusing in the first dimension with SDS-PAGE in the second. Ion-exchange chromatography is another technique that separates substances, including proteins, according to their charges, often employing positively charged resins like DEAE-Sephadex.Labeled DNA or RNA probes can be hybridized to DNAs with identical or very similar sequences on a Southern blot. Modern DNA typing employs Southern blots and multiple DNA probes to detect variable sites in individual organisms, including humans. Additionally, labeled probes may be hybridized to entire chromosomes to identify specific genes or DNA sequences, a process known as in situ hybridization, or fluorescence in situ hybridization (FISH) when fluorescently labeled probes are used. Proteins in complex mixtures can be detected and quantified using immunoblots, or Western blots, where proteins are electrophoresed, transferred to a membrane, and probed with specific antibodies detected via labeled secondary antibodies or protein A.The Sanger DNA sequencing method relies on dideoxy nucleotides to terminate DNA synthesis, producing DNA fragments of varying sizes that can be analyzed by electrophoresis. The last base of each fragment is determined by the specific dideoxy nucleotide used to terminate the reaction, enabling fragments to be ordered by size, with each one being a single, known base longer than the previous.























