Section 4.4The Three Roles of RNA in Protein Synthesis
Although DNA stores the information for protein synthesis and RNA carries out the instructions encoded in DNA, most biological activities are carried out by proteins. The accurate synthesis of proteins thus is critical to the proper functioning of cells and organisms. We saw in Chapter 3 that the linear order of amino acids in each protein determines its three-dimensional structure and activity. For this reason, assembly of amino acids in their correct order, as encoded in DNA, is the key to production of functional proteins.
Three kinds of RNA molecules perform different but cooperative functions in protein synthesis (Figure 4-20):
- 1.
Messenger RNA (mRNA) carries the genetic information copied from DNA in the form of a series of three-base code “words,” each of which specifies a particular amino acid.
- 2.
Transfer RNA (tRNA) is the key to deciphering the code words in mRNA. Each type of amino acid has its own type of tRNA, which binds it and carries it to the growing end of a polypeptide chain if the next code word on mRNA calls for it. The correct tRNA with its attached amino acid is selected at each step because each specific tRNA molecule contains a three-base sequence that can base-pair with its complementary code word in the mRNA.
- 3.
Ribosomal RNA (rRNA) associates with a set of proteins to form ribosomes. These complex structures, which physically move along an mRNA molecule, catalyze the assembly of amino acids into protein chains. They also bind tRNAs and various accessory molecules necessary for protein synthesis. Ribosomes are composed of a large and small subunit, each of which contains its own rRNA molecule or molecules.
Translation is the whole process by which the base sequence of an mRNA is used to order and to join the amino acids in a protein. The three types of RNA participate in this essential protein-synthesizing pathway in all cells; in fact, the development of the three distinct functions of RNA was probably the molecular key to the origin of life. How each RNA carries out its specific task is discussed in this section, while the biochemical events in protein synthesis and the required protein factors are described in the final section of the chapter.
Messenger RNA Carries Information from DNA in a Three-Letter Genetic Code
RNA contains ribonucleotides of adenine, cytidine, guanine, and uracil; DNA contains deoxyribonucleotides of adenine, cytidine, guanine, and thymine. Because 4 nucleotides, taken individually, could represent only 4 of the 20 possible amino acids in coding the linear arrangement in proteins, a group of nucleotides is required to represent each amino acid. The code employed must be capable of specifying at least 20 words (i.e., amino acids).
If two nucleotides were used to code for one amino acid, then only 16 (or 42) different code words could be formed, which would be an insufficient number. However, if a group of three nucleotides is used for each code word, then 64 (or 43) code words can be formed. Any code using groups of three or more nucleotides will have more than enough units to encode 20 amino acids. Many such coding systems are mathematically possible. However, the actual genetic code used by cells is a triplet code, with every three nucleotides being “read” from a specified starting point in the mRNA. Each triplet is called a codon. Of the 64 possible codons in the genetic code, 61 specify individual amino acids and three are stop codons. Table 4-2 shows that most amino acids are encoded by more than one codon. Only two — methionine and tryptophan — have a single codon; at the other extreme, leucine, serine, and arginine are each specified by six different codons. The different codons for a given amino acid are said to be synonymous. The code itself is termed degenerate, which means that it contains redundancies.
Synthesis of all protein chains in prokaryotic and eukaryotic cells begins with the amino acid methionine. In most mRNAs, the start (initiator) codon specifying this aminoterminal methionine is AUG. In a few bacterial mRNAs, GUG is used as the initiator codon, and CUG occasionally is used as an initiator codon for methionine in eukaryotes. The three codons UAA, UGA, and UAG do not specify amino acids but constitute stop (terminator) signals that mark the carboxyl terminus of protein chains in almost all cells. The sequence of codons that runs from a specific start site to a terminating codon is called a reading frame. This precise linear array of ribonucleotides in groups of three in mRNA specifies the precise linear sequence of amino acids in a protein and also signals where synthesis of the protein chain starts and stops.
Because the genetic code is a commaless, overlapping triplet code, a particular mRNA theoretically could be translated in three different reading frames. Indeed some mRNAs have been shown to contain overlapping information that can be translated in different reading frames, yielding different polypeptides (Figure 4-21). The vast majority of mRNAs, however, can be read in only one frame because stop codons encountered in the other two possible reading frames terminate translation before a functional protein is produced. Another unusual coding arrangement occurs be- cause of frameshifting. In this case the protein-synthesizing machinery may read four nucleotides as one amino acid and then continue reading triplets, or it may back up one base and read all succeeding triplets in the new frame until termination of the chain occurs. These frameshifts are not common events, but a few dozen such instances are known.
The meaning of each codon is the same in most known organisms — a strong argument that life on earth evolved only once. Recently the genetic code has been found to differ for a few codons in many mitochondria, in ciliated protozoans, and in Acetabularia, a single-celled plant. As shown in Table 4-3, most of these changes involve reading of normal stop codons as amino acids, not an exchange of one amino acid for another. It is now thought that these exceptions to the general code are later evolutionary developments; that is, at no single time was the code immutably fixed, although massive changes were not tolerated once a general code began to function early in evolution.
Experiments with Synthetic mRNAs and Trinucleotides Broke the Genetic Code
Having described the genetic code, we briefly recount how it was deciphered — one of the great triumphs of modern biochemistry. The underlying experimental work was carried out largely with cell-free bacterial extracts containing all the necessary components for protein synthesis except mRNA (i.e., tRNAs, ribosomes, amino acids, and the energy-rich nucleotides ATP and GTP).
Initially, researchers added synthetic mRNAs containing a single type of nucleotide to such extracts and then determined the amino acid incorporated into the polypeptide that was formed. In the first successful experiment, synthetic mRNA composed only of U residues [poly(U)] yielded polypeptides made up only of phenylalanine. Thus it was concluded that a codon for phenylalanine consisted entirely of U’s. Likewise, experiments with poly(C) and poly(A) showed that a codon for proline contained only C’s and a codon for lysine only A’s (Figure 4-22). [Poly(G) did not work in this type of experiment because it assumes an unusable stacked structure that is not translated well.] Next, synthetic mRNAs composed of alternating bases were used. The results of these experiments not only revealed more codons but also demonstrated that codons are three bases long. The example of this approach illustrated in Figure 4-23 led to identification of ACA as the codon for threonine and CAC for histidine. Similar experiments with many such mixed polynucleotides revealed a substantial part of the genetic code.
The entire genetic code was finally worked out by a second type of experiment conducted by Marshall Nirenberg and his collaborators. In this approach, all the possible trinucleotides were tested for their ability to attract tRNAs attached to the 20 different amino acids found in natural proteins (Figure 4-24). In all, 61 of the 64 possible trinucleotides were found to code for a specific amino acid; the trinucleotides UAA, UGA, and UAG did not encode amino acids.
Although synthetic mRNAs were useful in deciphering the genetic code, in vitro protein synthesis from these mRNAs is very inefficient and yields polypeptides of variable size. Successful in vitro synthesis of a naturally occurring protein was achieved first when mRNA from bacteriophage F2 (a virus) was added to bacterial extracts, leading to formation of the coat, or capsid, protein (the “packaging” protein that covers the virus particle). Studies with such natural mRNAs established that AUG encodes methionine at the start of almost all proteins and is required for efficient initiation of protein synthesis, while the three trinucleotides (UAA, UGA, and UAG) that do not encode any amino acid act as stop codons, necessary for precise termination of synthesis.
The Folded Structure of tRNA Promotes Its Decoding Functions
The next step in understanding the flow of genetic information from DNA to protein was to determine how thenucleotide sequence of mRNA is converted into the amino acid sequence of protein. This decoding process requires two types of adapter molecules: tRNAs and enzymes called aminoacyl-tRNA synthetases. First we describe the role of tRNAs in decoding mRNA codons, and then examine how synthetases recognize tRNAs.
All tRNAs have two functions: to be chemically linked to a particular amino acid and to base-pair with a codon in mRNA so that the amino acid can be added to a growing peptide chain. Each tRNA molecule is recognized by one and only one of the 20 aminoacyl-tRNA synthetases. Likewise, each of these enzymes links one and only one of the 20 amino acids to a particular tRNA, forming an aminoacyl-tRNA. Once its correct amino acid is attached, a tRNA then recognizes a codon in mRNA, thereby delivering its amino acid to the growing polypeptide (Figure 4-25).
As studies on tRNA proceeded, 30 – 40 different tRNAs were identified in bacterial cells and as many as 50 – 100 in animal and plant cells. Thus the number of tRNAs in most cells is more than the number of amino acids found in proteins (20) and also differs from the number of codons in the genetic code (61). Consequently, many amino acids have more than one tRNA to which they can attach (explaining how there can be more tRNAs than amino acids); in addition, many tRNAs can attach to more than one codon (explaining how there can be more codons than tRNAs). As noted previously, most amino acids are encoded by more than one codon, requiring some tRNAs to recognize more than one codon.
The function of tRNA molecules, which are 70 – 80 nucleotides long, depends on their precise three-dimensional structures. In solution, all tRNA molecules fold into a similar stem-loop arrangement that resembles a cloverleaf when drawn in two dimensions (Figure 4-26a). The four stems are short double helices stabilized by Watson-Crick base pairing; three of the four stems have loops containing seven or eight bases at their ends, while the remaining, unlooped stem contains the free 3′ and 5′ ends of the chain. Three nucleotides termed the anticodon, located at the center of one loop, can form base pairs with the three complementary nucleotides forming a codon in mRNA. As discussed later, specific aminoacyl-tRNA synthetases recognize the surface structure of each tRNA for a specific amino acid and covalently attach the proper amino acid to the unlooped amino acid acceptor stem. The 3′ end of all tRNAs has the sequence CCA, which in most cases is added after synthesis and processing of the tRNA are complete. Viewed in three dimensions, the folded tRNA molecule has an L shape with the anticodon loop and acceptor stem forming the ends of the two arms (Figure 4-26b).
Besides addition of CCA at the 3′ terminus after a tRNA molecule is synthesized, several of its nucleic acid bases typically are modified. For example, most tRNAs are synthesized with a four-base sequence of UUCG near the middle of the molecule. The first uridylate is methylated to become a thymidylate; the second is rearranged into a pseudouridylate (abbreviated Ψ), in which the ribose is attached to carbon 5 instead of to nitrogen 1 of the uracil. These modifications produce a characteristic TΨCG loop in an unpaired region at approximately the same position in nearly all tRNAs (see Figure 4-26a).
Nonstandard Base Pairing Often Occurs between Codons and Anticodons
If perfect Watson-Crick base pairing were demanded between codons and anticodons, cells would have to contain exactly 61 different tRNA species, one for each codon that specifies an amino acid. As noted above, however, many cells contain fewer than 61 tRNAs. The explanation for the smaller number lies in the capability of a single tRNA anticodon to recognize more than one, but not necessarily every, codon corresponding to a given amino acid. This broader recognition can occur because of nonstandard pairing between bases in the so-called “wobble” position: the third base in a mRNA codon and the corresponding first base in its tRNA anticodon. Although the first and second bases of a codon form standard Watson-Crick base pairs with the third and second bases of the corresponding anticodon, four nonstandard interactions can occur between bases in the wobble position. Particularly important is the G·U base pair, which structurally fits almost as well as the standard G·C pair. Thus, a given anticodon in tRNA with G in the first (wobble) position can base-pair with the two corresponding codons that have either pyrimidine (C or U) in the third position (Figure 4-27). For example, the phenylalanine codons UUU and UUC (5′ → 3′) are both recognized by the tRNA that has GAA (5′ → 3′) as the anticodon. In fact, any two codons of the type NNPyr (N = any base; Pyr = pyrimidine) encode a single amino acid and are decoded by a single tRNA with G in the first (wobble) position of the anticodon.
Although adenine rarely is found in the anticodon wobble position, many tRNAs in plants and animals contain inosine (I), a deaminated product of adenine, at this position. Inosine can form nonstandard base pairs with A, C, and U (Figure 4-28). A tRNA with inosine in the wobble position thus can recognize the corresponding mRNA codons with A, C, or U in the third (wobble) position (see Figure 4-27). For this reason, inosine-containing tRNAs are heavily employed in translation of the synonymous codons that specify a single amino acid. For example, four of the six codons for leucine have a 3′ A, C, or U (see Table 4-2); these four codons are all recognized by the same tRNA (3′-GAI-5′), which has inosine in the wobble position of the anticodon (and thus recognizes CUA, CUC, and CUU), and uses a G·U pair in position 1 to recognize the UUA codon.
Aminoacyl-tRNA Synthetases Activate Amino Acids by Linking Them to tRNAs
Recognition of the codon or codons specifying a given amino acid by a particular tRNA is actually the second step in decoding the genetic message. The first step, attachment of the appropriate amino acid to a tRNA, is catalyzed by a specific aminoacyl-tRNA synthetase (see Figure 4-25). Each of the 20 different synthetases recognizes one amino acid and all its compatible, or cognate, tRNAs. These coupling enzymes link an amino acid to the free 2′ or 3′ hydroxyl of the adenosine at the 3′ terminus of tRNA molecules by a two-step ATP-requiring reaction (Figure 4-29). About half the aminoacyl-tRNA synthetases transfer the aminoacyl group to the 2′ hydroxyl of the terminal adenosine (class I), and about half to the 3′ hydroxyl (class II). In this reaction, the amino acid is linked to the tRNA by a high-energy bond and thus is said to be activated. The energy of this bond subsequently drives the formation of peptide bonds between adjacent amino acids in a growing polypeptide chain. The equilibrium of the aminoacylation reaction is driven further toward activation of the amino acid by hydrolysis of the high-energy phosphoanhydride bond in pyrophosphate. The overall reaction is
The amino acid sequences of the aminoacyl-tRNA synthetases (ARSs) from many organisms are now known, and the three-dimensional structures of over a dozen enzymes of both classes have been solved. Each of these enzymes has a rather precise binding site for ATP (GTP is not admitted and CTP and UTP are too small) and binding pockets for its specific amino acid. Class I and class II enzymes bind to opposite faces of the incoming tRNAs. The binding surfaces of class I enzymes tend to be somewhat complementary to those of class II enzymes. These different binding surfaces and the consequent alignment of bound tRNAs probably account in part for the difference in the hydroxyl group to which the aminoacyl group is transferred (Figure 4-30). Because some amino acids are so similar structurally, aminoacyl-tRNA synthetases sometimes make mistakes. These are corrected, however, by the enzymes themselves, which check the fit in the binding pockets and facilitate deacylation of any misacylated tRNAs. This crucial function helps guarantee that a tRNA delivers the correct amino acid to the protein-synthesizing machinery.
Each tRNA Molecule Is Recognized by a Specific Aminoacyl-tRNA Synthetase
The ability of aminoacyl-tRNA synthetases to recognize their correct cognate tRNAs is just as important to the accurate translation of the genetic code as codon-anticodon pairing. Once a tRNA is loaded with an amino acid, codon-anticodon pairing directs the tRNA into the proper ribosome site; if the wrong amino acid is attached to the tRNA, an error in protein synthesis results.
As noted already, each aminoacyl-tRNA synthetase can aminoacylate all the different tRNAs whose anticodons correspond to the same amino acid. Therefore, all these cognate tRNAs must have a similar binding site, or “identity element,” that is recognized by the synthetase. One approach for studying the identity elements in tRNAs that are recognized by aminoacyl-tRNA synthetases is to produce synthetic genes that encode tRNAs with normal and various mutant sequences by techniques discussed in Chapter 7. The normal and mutant tRNAs produced from such synthetic genes then can be tested for their ability to bind purified synthetases.
Very probably no single structure or sequence completely determines a specific tRNA identity. However, some important structural features of several E. coli tRNAs that allow their cognate synthetases to recognize them are known. Perhaps the most logical identity element in a tRNA molecule is the anticodon itself. Experiments in which the anticodons of methionine tRNA (tRNAMet) and valine tRNA (tRNAVal) were interchanged showed that the anticodon is of major importance in determining the identity of these two tRNAs. In addition, x-ray crystallographic analysis of the complex between glutamine aminoacyl-tRNA synthetase (GlnRS) and glutamine tRNA (tRNAGln) showed that each of the anticodon bases neatly fits into a separate, specific “pocket” in the three-dimensional structure of GlnRS. Thus this synthetase specifically recognizes the correct anticodon.
However, the anticodon may not be the principal identity element in other tRNAs (see Figure 4-30). Figure 4-31 shows the extent of base sequence conservation in E. coli tRNAs that become linked to the same amino acid. Identity elements are found in several regions, particularly the end of the acceptor arm. A simple case is presented by tRNAAla: a single G·U base pair (G3·U70) in the acceptor stem is necessary and sufficient for recognition of this tRNA by its cognate aminoacyl-tRNA synthetase. Solution of the three-dimensional structure of additional complexes between aminoacyl-tRNA synthetases and their cognate tRNAs should provide a clear understanding of the rules governing the recognition of tRNAs by specific synthetases.
Ribosomes Are Protein-Synthesizing Machines
If the many components that participate in translating mRNA had to interact in free solution, the likelihood of simultaneous collisions occurring would be so low that the rate of amino acid polymerization would be very slow. The efficiency of translation is greatly increased by the binding of the mRNA and the individual aminoacyl-tRNAs to the most abundant RNA-protein complex in the cell — the ribosome. This two-part machine directs the elongation of a polypeptide at a rate of three to five amino acids added per second. Small proteins of 100 – 200 amino acids are therefore made in a minute or less. On the other hand, it takes 2 to 3 hours to make the largest known protein, titin, which is found in muscle and contains 30,000 amino acid residues. The machine that accomplishes this task must be precise and persistent.
With the aid of the electron microscope, ribosomes were first discovered as discrete, rounded structures prominent in animal tissues secreting large amounts of protein; initially, however, they were not known to play a role in protein synthesis. Once reasonably pure ribosome preparations were obtained, radiolabeling experiments showed that radioactive amino acids first were incorporated into growing polypeptide chains associated with ribosomes before appearing in finished chains.
A ribosome is composed of several different ribosomal RNA (rRNA) molecules and more than 50 proteins, organized into a large subunit and a small subunit. The proteins in the two subunits differ, as do the molecules of rRNA. The small ribosomal subunit contains a single rRNA molecule, referred to as small rRNA; the large subunit contains a molecule of large rRNA and one molecule each of two much smaller rRNAs in eukaryotes (Figure 4-32). The ribosomal subunits and the rRNA molecules are commonly designated in svedbergs (S), a measure of the sedimentation rate of suspended particles centrifuged under standard conditions (Chapter 3). The lengths of the rRNA molecules, the quantity of proteins in each subunit, and consequently the sizes of the subunits differ in prokaryotic and eukaryotic cells. (The small and large rRNAs are about 1500 and 3000 nucleotides long in bacteria and about 1800 and 5000 nucleotides long in humans.) Perhaps of more interest than these differences are the great structural and functional similarities among ribosomes from all species. This consistency is another reflection of the common evolutionary origin of the most basic constituents of living cells.
The sequences of the small and large rRNAs from several thousand organisms are now known. Although the primary nucleotide sequences of these rRNAs vary considerably, the same parts of each type of rRNA theoretically can form base-paired stem-loops, generating a similar threedimensional structure for each rRNA in all organisms. Evidence that such stem-loops occur in rRNA was obtained by treating rRNA with chemical agents that cross-link paired bases; the samples then were digested with enzymes that destroy single-stranded rRNA, but not any cross-linked (base-paired) regions. Finally, the intact, cross-linked rRNA that remained was collected and sequenced, thus identifying the stem-loops in the original rRNA. Experiments of this type have located about 45 stem-loops at similar positions in small rRNAs from many different prokaryotes and eukaryotes (Figure 4-33). An even larger number of regularly positioned stem-loops have been demonstrated in large rRNAs. All the ribosomal proteins have been identified and their sequences determined, and many have been shown to bind specific regions of rRNA. It seems clear that the fundamental protein-synthesizing machinery in all present-day cells arose only once and has been modified about a common plan during evolution.
During protein synthesis, a ribosome moves along an mRNA chain, interacting with various protein factors and tRNA and very likely even undergoes shape changes. Despite the complexity of the ribosome, great progress has been made in determining both the overall structure of bacterial ribosomes and in identifying reactive sites that bind specific proteins, mRNA, and tRNA and that participate in important steps in protein synthesis. Quite detailed models of the large and small ribosomal subunits from E. coli have been constructed based on cryoelectron microscope and neutron-scattering studies (Figure 4-34). These studies not only have determined the dimensions and overall shape of the ribosomal subunits, but also have localized the positions of tRNAs bound to the ribosome during protein chain elongation. Powerful chemical experiments have also helped unravel the complex interactions between proteins and RNAs. In a technique called footprinting, for example, ribosomes are treated with chemical reagents that modify single-stranded RNA unprotected by binding either to protein or to another RNA. If the total sequence of the RNA is known, then the location of the modified nucleotides can be located within the molecule. (This technique, which is also useful for locating protein-binding sites in DNA, is described in Chapter 10.) Thus the overall structure and function of ribosomes during protein synthesis is finally, after 40 years, yielding to successful experiments. How these results aid in understanding the specific steps in protein synthesis is described in the next section.
SUMMARY
- Genetic information is copied into mRNA in the form of a commaless, overlapping, degenerate triplet code. Each amino acid is encoded by one or more three-base sequences, or codons, in mRNA. Each codon specifies one amino acid, but most amino acids are encoded by multiple codons (see Table 4-2).
- The AUG codon for methionine is the most common start codon, specifying the amino acid at the NH2-terminus of a protein chain. Three codons function as stop codons and specify no amino acids.
- A reading frame, the uninterrupted sequence of codons in mRNA from a specific start codon to a stop codon, is translated into the linear sequence of amino acids in a protein.
- Decoding of the nucleotide sequence in mRNA into the amino acid sequence of proteins depends on transfer RNAs and amino-acyl tRNA synthetases (see Figure 4-25).
- All tRNAs have a similar three-dimensional structure that includes an acceptor arm for attachment of a specific amino acid and a stem-loop with a three-base anticodon sequence at its ends (see Figure 4-26). The anticodon can base-pair with its corresponding codon or codons in mRNA.
- Because of nonstandard interactions, a tRNA may base-pair with more than one mRNA codon, and conversely, a particular codon may base-pair with multiple tRNAs.
- Each of the 20 aminoacyl-tRNA synthetases recognizes a single amino acid and covalently links it to a cognate tRNA, forming an aminoacyl-tRNA (see Figure 4-29). This reaction activates the amino acid, so it can participate in peptide-bond formation.
- The composition of ribosomes — the large ribonucleoprotein complexes on which proteins are synthesized — is quite similar in all organisms (see Figure 4-32). All ribosomes are composed of a small and a large subunit. Each contains numerous different proteins and one rRNA (small or large). The large subunit also contains one accessory RNA (5S).
- Analogous rRNAs from many different species fold into quite similar three-dimensional structures containing numerous stem-loops and binding sites for proteins, mRNA, and tRNAs. As a ribosome moves along an mRNA, a region of the large rRNA mole- cule in each ribosome sequentially binds the aminoacyl-ated ends of incoming tRNAs and probably catalyzes peptide-bond formation (see Figure 4-34).