BioSystems 80 (2005) 175–184 The origin of the genetic code: theories and their relationships, a review Massimo Di Giulio∗ Institute of Genetics and Biophysics ‘Adriano Buzzati-Traverso’, CNR, Via G. Marconi 10, 80125 Naples, Italy Received 3 August 2004; received in revised form 12 November 2004; accepted 18 November 2004 Abstract A review of the main theories proposed to explain the origin of the genetic code is presented. I analyze arguments and data in favour of different theories proposed to explain the origin of the organization of the genetic code. It is possible to suggest a mechanism that makes compatible the different theories of the origin of the code, even if these are based on a historical or physicochemical determinism and thus appear incompatible by definition. Finally, I discuss the question of why a given number of synonymous codons was attributed to the amino acids in the genetic code. © 2004 Elsevier Ireland Ltd. All rights reserved. Keywords: Genetic code theories; Coevolution; Stereochemistry; Error-minimization hypothesis; Codon plurality; Molecular weight of amino acids 1. Introduction The theories suggested to explain the origin of the genetic code are of two kinds. One of them is based on a historical determinism (Wong, 1975), the others on a physicochemical determinism (Sonneborn, 1965; Woese et al., 1966; Woese, 1967). There is the weak possibility that an early origin of the genetic code indicates that it was based on physicochemical (stereochemical) forces (Di Giulio, 1998). Alternatively, a late origin of the code in the development of life might indi∗ Tel.: +39 081 7257313; fax: +39 081 5936123. E-mail address: digiulio@igb.cnr.it. cate that this origin was not based on these forces since ‘the system’ might have already abandoned the strictly physicochemical determinism. It seems natural to think that a late phase of the origin of life witnessed the origin of the genetic code as several dozen macromolecules are required to achieve the code, but we cannot be sure that this was the case (Di Giulio, 1998, in press). In this review, I analyze arguments and data in favour of different theories proposed to explain the origin of the organization of the genetic code. Furthermore, I try to answer the question of why there exists a given number of synonymous codons attributed to the amino acids in the genetic code. This question is not often considered in papers on the origin of the code. 0303-2647/$ – see front matter © 2004 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.biosystems.2004.11.005 176 M. Di Giulio / BioSystems 80 (2005) 175–184 2. The stereochemical theory and evidence in favour The stereochemical theory claims the origin of the genetic code must lie in the stereochemical interactions between anticodons or codons and amino acids (Crick, 1968). The theory suggests, for example, asparagine must have been codified by the codons AAU or AAC as asparagine is somehow stereochemically correlated with these codons. Several models have been proposed which indeed seem to define a stereochemical relationship between anticodons or codons and amino acids (Gamow, 1954; Pelc and Welton, 1966; Welton and Pelc, 1966; Dunnill, 1966; Woese, 1967; Black, 1973, 1995; Melcher, 1974; Nelsestuen, 1978; Balasubramanian et al., 1980; Marlborough, 1980; Hendry et al., 1981; Shimizu, 1982; Yarus, 1991; Szathmary, 1993). The first stereochemical model was suggested in 1954 by Gamow, before the discovery of the genetic code. Gamow (1954) proposed a ‘key and lock’ relation between amino acids and the rhomb-shaped ‘holes’ formed by various nucleotides in the DNA. This model has the elegant property of being able to encode only 20 amino acids. Later, Melcher (1974) built models defining a stereochemical correlation between anticodon nucleotides and their amino acids. The feature of these models (Melcher, 1974) was the intercalation of the amino acid and the binding of the aliphatic amino acids hydrogen atoms through hydrogen bonds to the electrons of the bases. Balasubramanian et al. (1980) constructed models based on oligoribonucleotides of five residues having a purine at the 3 -end and an U at the 5 -end, and any combination of three bases in the middle. These pentaribonucleotides, which the authors (Balasubramanian et al., 1980) consider to be a prototRNA, are shown to have a conformation capable of receiving the relating amino acid. Hendry et al. (1981) constructed models by eliminating the second base of a codon in B-DNA in order to analyze the proprieties of the ‘cavities’ thus formed. The authors (Hendry et al., 1981) observed that the l-amino acids adapt well to these cavities if the conventional physicochemical principles of hydrogen bonding and sterical constraints are used. Shimizu (1982) proposed a model based on a complex of four bases on the tRNAs. These complexes are formed of the anticodon nucleotides and the discriminator base at the fourth position of the 3 -end of tRNAs. These complexes are able to possess a lock and key relationship with the corresponding amino acid (Shimizu, 1982). The heterogeneity of stereochemical models seems to suggest that there were interactions between amino acids and anticodons or codons or, more generally, with RNA or DNA. Whether or not these interactions were specific and led to the organization of the genetic code has yet to be proven. The model offered by Shimizu (1982), seems to be based on a variety of experimental evidence (see Szathmary, 1993). Furthermore, numerous hairpin structures housing anticodons have been constructed (Shimizu, 1995) so as to make these structures similar to the stereochemical model (Shimizu, 1982). These hairpin structures were specifically aminoacylated with their cognate amino acids in the presence of aminoacyl-adenylate and a dipeptide, valyl-aspartic (Shimizu, 1995). The author claims that his results are compatible with a stereochemical origin of the genetic code (Shimizu, 1995). (However, the validity of this analysis (Shimizu, 1995) has been questioned (Larkin et al., 2001).) Moreover, the stereochemical theory has received new strength thanks to the findings of Yarus’ group (Yarus, 1988, 1991, 1993, 1998, 2000; Yarus and Christian, 1989; Lozupone et al., 2003). Recent evidence on the stereochemical hypothesis suggests that a record of the genetic code’s origin is available today, in the structures of RNA binding sites for amino acids (Yarus, 2002). This data begin with the RNA structure of the Tetrahymena guanosine site for the self-splicing G cofactor (Yarus and Christian, 1989). When this site was located, it contained arginine codons and also bound arginine, which acted as a competitive inhibitor of self-splicing (Yarus, 1988). This finding could be generalized to the notion that amino acid binding sites made of ribonucleotides would generally contain an excess of the codons and anticodons, a hypothesis testable by selection of new aptamers. A recent review of selections for RNA affinity for six amino acids (arginine, glutamine, isoleucine leucine, phenylalanine and tyrosine) found codons and anticodons in excess in 26 known binding sites with probability of the order or smaller than 10−11 (assuming placement of triplets at random) (Yarus, 2002). This concentration of triplets is striking despite the fact that only one half of amino acids (three of six: arginine, tyrosine and isoleucine) showed coding triplets in excess. For isoleucine, the M. Di Giulio / BioSystems 80 (2005) 175–184 argument can be carried a step further because the simplest binding site (having the fewest nucleotides) and therefore the most prevalent isoleucine-binding RNA under a variety of conditions, contains codons and anticodons within the binding site (Lozupone et al., 2003). These data can be combined to argue that a substantial fraction of the genetic code (for half of the amino acids, judging from this evidence) was derived by extracting parts of primordial amino acid binding structures, and that these essentially physicochemical coding assignments survived to the modern code (Yarus, 2002). 3. The physicochemical and ambiguity reduction theories: evidence in favour The physicochemical theory claims that the force defining the origin of the genetic code structure was the one that tended to reduce the deleterious effects of physicochemical distances between amino acids codified by codons differing in one base (Sonneborn, 1965; Woese et al., 1966). In particular, Sonneborn (1965) identified the selective pressure reducing the deleterious effects of mutations as the force defining the amino acid allocations in the genetic code table (Ardell and Sella, 2001; Sella and Ardell, 2002). Whereas, Woese et al. (1966) maintained that the driving force defining genetic code organization must lie in a selective pressure tending to reduce the translation errors of the ancestral genetic message. A similar theory is the ambiguity reduction hypothesis. This theory claims that group codons differing in one base were assigned to groups of physicochemical similar amino acids, and the genetic code reached its current organization through the lowering of the ambiguity in the coding within and between groups of amino acids (Woese, 1965; Fitch and Upper, 1987). Only one study conducted on 300 tRNAs sequences specific for 8 amino acids (Fitch and Upper, 1987) is in favour of the ambiguity reduction theory (Woese, 1965; Fitch and Upper, 1987). Other and equivalent analyses are in favour of the coevolution theory (Di Giulio, 1992a, 1994a, 1995; Chaley et al., 1999; Bermudez et al., 1999). A point of view that is different in some aspects from this hypotheses regards whether there was physicochemical affinity between amino acids and the doublets/triplets coding for them, which might have or- 177 ganized the genetic code. Some studies suggest that this might have been the case (Weber and Lacey, 1978; Jungck, 1978; Lacey et al., 1992). On the other hand, the physicochemical hypothesis (Sonneborn, 1965; Woese et al., 1966) is based on a lot of evidence suggesting a relationship between the physicochemical properties of amino acids and the structure of the genetic code (Pelc, 1965; Woese et al., 1966; Epstein, 1966; Goldberg and Wittes, 1966; Volkenstein, 1966; Alff-Steinberger, 1969; Nagyvary and Fendler, 1974; Nelsestuen, 1978; Weber and Lacey, 1978; Jungck, 1978; Wetzel, 1978; Wolfenden et al., 1979; Jurka et al., 1982; Lacey and Mullins, 1983; Swanson, 1984; Sjostrom and Wold, 1985; Taylor and Coates, 1989; Di Giulio, 1989a,b, 1991, 1992a, 1994a; Haig and Hurst, 1991; Lacey et al., 1992, 1993; Siemion and Stefanowicz, 1992; Szathmary, 1993; Goldman, 1993; Baumann and Oro, 1993; Di Giulio et al., 1994; Hartman, 1995; Di Giulio and Medugno, 1998, 1999, 2001; Freeland and Hurst, 1998a,b; Knight et al., 1999; Freeland et al., 2000a,b; Ardell and Sella, 2001, 2002; Sella and Ardell, 2002; Freeland et al., 2003; Zhu et al., 2003; Archetti, 2004). Therefore, at some stages in the origin of the genetic code, the properties of amino acids must have sustained a remarkable role in structuring its organization. In particular, Freeland and Hurst (1998a) investigated: (1) the effect of weighting transition errors differently from transversion errors and (2) the effect of weighting each base differently, depending on reported mistranslation biases. They concluded that not only the genetic code is extremely efficient at minimizing the effects of errors, but also that its structure reflects biases in these errors (Freeland and Hurst, 1998a,b). Whereas, Freeland and Hurst (1998b) compared the error-minimizing ability of the genetic code with that of alternative codes which, rather than being a random selection, are restricted such that amino acids from the same biosynthetic pathway all share the same first base. They concluded that although on average the restricted set of codes show a slightly higher efficiency than random ones, the real genetic code remains extremely efficient relative to this subset (P = 0.0003) (Freeland and Hurst, 1998b). This indicates that for the most part, historical factor do not explain the load-minimization property of the genetic code (Freeland and Hurst, 1998b). Furthermore, the importance of selection is supported by the finding that the genetic code’s efficiency improves 178 M. Di Giulio / BioSystems 80 (2005) 175–184 relative to that of historical related codes after allowance is made for realistic mutational and mistranslational biases (Freeland and Hurst, 1998b). Once mistranslational biases have been considered, fewer than 4 per 100,000 alternative codes are better than the current genetic code (Freeland and Hurst, 1998b). Also, Freeland et al. (2000b) have shown that if theoretically possible genetic code structures are reflecting plausible biological constraints, and amino acids dissimilarity is quantified using data of substitution frequencies, then the code is at or very close to a global optimum for error minimization. (However, the validity of this analysis (Freeland et al., 2000b) has been questioned (Di Giulio, 2001).) More recently, Freeland et al. (2003) have further reinforced the error-minimizing hypothesis. 4. The coevolution theory and evidence in favour The coevolution hypothesis of the origin of the genetic code (Wong, 1975) suggests that the origin of the genetic code should be sought in the biosynthetic relationships between amino acids. In particular, this hypothesis maintains that early on in the genetic code few amino acids (perhaps five) were codified: the precursors (Wong, 1975). As the other amino acids arose biosynthetically from these precursors, part or all of the codon domain of the precursor amino acid was passed to the product amino acids (Wong, 1975). The mechanism through which the precursor amino acids passed part or all their codon domain to the precursor amino acids is postulated by the coevolution theory as occurring on tRNA-like molecule on which this theory suggests the biosynthetic transformation between amino acids took place (Wong, 1975). If the biosynthetic pathways linking up the amino acids took place on tRNA-like molecules, then a tRNAlike molecule bearing a product amino acid evolving from the biosynthetic transformation of a given precursor amino acid must clearly have recognized some of codons belonging to the precursor. Therefore, this molecule was able to evolve naturally towards a tRNA specific for that particular product amino acid and its reassigned codons. There are a number of observations and suggestions pointing out that the biosynthetic pathways of amino acids played a fundamental role in structuring the organization of the genetic code table (Nirenberg et al., 1963; Pelc, 1965; Jukes, 1966; Dillon, 1973; Wong, 1975, 1976, 1980, 1981; Brack and Orgel, 1975; McClendon, 1986, 1987; Jurka and Smith, 1987a,b; Wachtershauser, 1988; de Duve, 1991; Miseta, 1989; Taylor and Coates, 1989; Danchin, 1989; Szathmary and Zintzaras, 1992; Morowitz, 1992; Szathmary, 1993; Di Giulio, 1992a,b, 1993, 1994, 1995, 1996, 1997a–c, 1999, 2000a,b, 2001, 2002, 2003; Di Giulio and Medugno, 1998, 1999, 2000, 2001; Di Giulio et al., 1994; Edwards, 1996; Chaley et al., 1999; Bermudez et al., 1999; Ardell and Sella, 2002; Zhu et al., 2003; Archetti, 2004; Klipcan and Safro, 2004), thus corroborating the coevolution theory. In particular, since the genetic code makes possible the transformation of mRNA into proteins, a conjecture claims that some fundamental themes of protein structure must be reflected in the genetic code table, as these themes might have been the main selective pressure promoting the organization of the genetic code. Di Giulio (1996) has tried to clarify how the physicochemical properties of amino acids are shared among the pairs of amino acids that are in precursor–product relationship and those that are not but which are nevertheless defined in the genetic code. He found (Di Giulio, 1996) that the pairs in precursor–product relationships reflect the -sheets of proteins through the ‘size’ of amino acids (Di Giulio, 1996). This study (Di Giulio, 1996) would seem to have identified the main adaptive theme promoting the organization of the genetic code in the -sheets. Furthermore, as the -sheets of proteins are linked to the precursor–product relationships and, it would also seem to provide strong evidence in favour of the coevolution hypothesis. Moreover, I believe some molecular fossils that strongly corroborate the coevolution theory are the main evidence in favour of it. Table 1 shows the biosynthetic pathways taking place on tRNAs, which in actual organisms transform one amino acid into another. For example, in the pathway Asp-tRNAAsn → AsntRNAAsn an aspartic acid molecule is loaded onto a tRNA specific for asparagine and an second enzyme transforms aspartic acid into asparagine. The tRNA loaded with asparagine in this way is accepted by ribosome and takes part in protein synthesis. There have been several interpretations of these pathways (Ardell, 1998; Poole et al., 1998; Cavalier-Smith, 2001); it is more than likely that they are fossils of a metabolic state M. Di Giulio / BioSystems 80 (2005) 175–184 Table 1 The pathways currently transforming one amino acid into another while charged on a tRNA, together with their phylogenetic distribution Pathways Phylogenetic distribution Glu-tRNAGln → Gln-tRNAGln Asp-tRNAAsn → Asn-tRNAAsn Bacteria and archaea Bacteria (present in minority) and archaea Bacteria, archaea, and eucarya Bacteria and organelles Some archaea and bacteria Ser-tRNASec → Sec-tRNASec Met-tRNAfMet → fMet-tRNAfMet Lys-tRNAPyl → Pyl-tRNAPyl For the original literature, see Bock et al. (1991), Ibba et al. (1997), Tumbula et al. (2000) and Ibba and Soll (2002). (Wong, 1976, 1988; Wachtershauser, 1988; de Duve, 1988, 1991; Benner et al., 1989; Danchin, 1989; Di Giulio, 1992a,b, 1993, 1997a–c, 1999, 2000a–d, 2002; Edwards, 1996). Di Giulio (2002) performed an evolutionary analysis, which seems to establish that these pathways (Table 1) are ancient, possible molecular fossils of the mechanism that gave rise to the evolutionary organization of the genetic code. It seems to me that the correspondence between the mechanism hypothesized by the coevolution theory (and based on the transformation of the precursor amino acid into the product, while they were loaded on a tRNA-like molecule (Wong, 1975)), and the presence in actual organisms of these pathways (Table 1) is surprising and therefore makes up an extremely strong corroboration of this hypothesis. This is because, as molecular fossils these pathways (Table 1) would provide evidence of significant value since they would have a high content of the history of early phases of life on earth, and they might, therefore, remember the primordial stages of the origin of the genetic code (Di Giulio, 2002). 5. Relationships between genetic code theories While the relationships between the stereochemical and the physicochemical theories are sufficiently clear, for example, the observations linking the physicochemical properties of amino acids to the properties of dinucleoside monophosphates (Weber and Lacey, 179 1978; Jungck, 1978; Lacey et al., 1992) may be expressions of stereochemical interactions, the relationship between the coevolution theory and the stereochemical or physicochemical theories seems to be much less clear (Di Giulio, 1997b). The coevolution theory seems to be compatible with some aspects of the physicochemical and ambiguity reduction theories. Indeed, if there was a selective pressure tending to organize the code in columns (Nelsestuen, 1978; Wolfenden et al., 1979; Sjostrom and Wold, 1985; Di Giulio, 1989a; Taylor and Coates, 1989) then as the precursor amino acids gradually conceded part of their codon domain to the product amino acids (Wong, 1975, 1988), these latter were attributed with codons in the genetic code in such a way that physicochemically similar amino acids were assigned to the same code column. However, if we assume that the observations regarding a relationship between the physicochemical properties of amino acids and those of anticodons are true (Weber and Lacey, 1978; Jungck, 1978), then we introduce constraints that generate difficulties, which require explanations (Di Giulio, 1997b). If the allocations of amino acids in the genetic code stem primarily from the biosynthetic relationships between amino acids as predicted by the coevolution theory (Wong, 1975), then the initial attribution of codon domains to the precursor amino acids implies that the assignment of codons to product amino acids was entirely defined prior to their biosynthetic appearance. Thus, we have to explain how the physicochemical correlations between the product amino acids and their anticodons (or codons) (Weber and Lacey, 1978; Jungck, 1978; Di Giulio, 1992, 1996, 1997) with the observation that these latter were assigned before the biosynthetic appearance of product amino acids (Di Giulio, 1997). It is clear that the mechanism of codon concession from the precursors to the products creates difficulties because the evolving product amino acids took pre-assigned codons, and moreover physicochemical correlations between product amino acids and anticodons (or codons) must be true. To solve this difficulty, the hypothesis of the coevolution between the origin of anticodons (and/or codons) and the evolution of amino acids was suggested (Di Giulio, 1998). If the organization of the genetic code was defined by: (1) the biosynthetic relationships between amino acids and (2) the anticodon–amino acid 180 M. Di Giulio / BioSystems 80 (2005) 175–184 interactions, then it is reasonable to assume that there might have been a coevolution between the origin of anticodons and the biosynthetic pathways of amino acids in such a way that the anticodons came into contact with the amino acids on RNA hairpin structures (Hopfield, 1978; Di Giulio, 1992, 1994, 1995, 1996; Shimizu, 1995). (In a model of the origin of the tRNA molecule the anticodons/codons are in the stems of hairpin structures near the amino acid attachment site (Di Giulio, 1992, 1995)). The coevolution theory claims that the codon domains of the precursor amino acids must have been pre-assigned so as to ensure the contiguity of the amino acids in precursor–product relationships (Wong, 1975). Thus, it becomes easier to explain why the physicochemical properties of amino acids are reflected in the genetic code together with the observation that the biosynthetic relationships of amino acids are also reflected, and that these two forces did not act independently (Di Giulio, 1992, 1996, 1997b). In fact, we need to simply postulate that the evolving anticodons still belonging to the precursor amino acids played an active role in selecting the emergent product amino acids. For instance, in the biosynthetic transformation Glu → Pro, only when an amino acid (Pro) capable of physicochemical interaction with the anticodons NGG developed, did the process of exploring the products of Glu terminate as far as these anticodons are concerned. Therefore, this model (Di Giulio, 1998) considers that the product amino acids were not at all specified in the early phase of genetic code origin (Wong, 1975) and whereas their selection did indeed depend on anticodons, it was also determined by the history of the biosynthetic relationships between amino acids. This model (Di Giulio, 1998) which thus sees the evolving anticodons playing an active role in the selection of the product amino acid as it affects and addresses the biosynthetic transformations of the precursor amino acid, makes compatible the coevolution theory (Wong, 1975) with the stereochemical theory (Woese, 1967) and part of the physicochemical theory (Weber and Lacey, 1978; Jungck, 1978; Lacey and Mullins, 1983; Lacey et al., 1992). Finally, although this model (Di Giulio, 1998) makes compatible the theories of the origin of the genetic code, nevertheless I believe that the coevolution theory and the stereochemical theory are incompatible since these are based on a different determinism: historical the former, physicochemical the later. 6. Why was a given number of synonymous codons attributed to the amino acids in the genetic code? In the genetic code table, we have to explain why various amino acids are codified by a different number of synonymous codons. A negative correlation it has been repeatedly observed between the number of codons specifying for Fig. 1. Relation between the number of codons attributed to amino acids in the genetic code and the molecular weight of amino acids (Hasegawa and Miyata, 1980; Di Giulio, 1989a; Taylor and Coates, 1989; Dufton, 1997). M. Di Giulio / BioSystems 80 (2005) 175–184 amino acids in the genetic code and the ‘size’ of amino acids (Hasegawa and Miyata, 1980; Di Giulio, 1989a; Taylor and Coates, 1989; Dufton, 1997). For instance, there is a negative correlation between the number of codons and the molecular weight of amino acids (Fig. 1; Hasegawa and Miyata, 1980; Di Giulio, 1989a; Taylor and Coates, 1989). If arginine, which seems subject to particular selective constraints in mesophiles (Jukes, 1978), is eliminated from this correlation (Fig. 1) we obtain an increase in the significance (Di Giulio, 1989a). Therefore, for arginine this correlation is not true but, in fact, it is true the inverse. It is likely that arginine has a large number of synonymous in the genetic code as this amino acid has the highest thermophily rank (Di Giulio, 2000a–d), and under the hypothesis of a hot origin of life there was a selective advantage to attribute six codons to arginine (Di Giulio, 2000a–d). Furthermore, Di Giulio (in press) have observed that there is a statistically significant positive correlation between the number of codons attributed to amino acids in the genetic code and the values of the hydrostatic pressure asymmetry index (PAI) of amino acids (i.e. a measurement of barophilicity of amino acids) (Di Giulio, in press). In other words, the more barophilic amino acids have, on average, a larger number of codons compared to amino acids less used in barophiles (Di Giulio, in press). Therefore, the significant and negative correlation between the number of codons and the molecular weight of amino acids (Fig. 1) seems to be nothing other than a expression of barophily by means of the negative correlation between the pressure asymmetry index (PAI) of amino acids with the molecular weight of amino acids (Di Giulio, in press). This is because these two latter correlations would imply the observed positive correlation between the number of codons and the PAI values (Di Giulio, in press). The high hydrostatic pressure was probably the main selective strength making the genetic code attribute a given number of codons to amino acids (Di Giulio, in press), and in particular, making it attribute, on average, more codons to the ‘smaller’ amino acids, i.e. the more barophilic ones (Di Giulio, in press). While, the molecular weight, i.e. the ‘size’ of amino acids was probably the property on which natural selection acted to favour the construction of proteins that were simply more stable at high hydrostatic pressure (Di Giulio, in press). 181 7. Synthesis: the coevolution theory and the ancestral metabolism The coevolution theory of the origin of the genetic code identifies a tRNA-like molecule as a device through which precursor amino acids ceded part or all of their codons to the product amino acids derived from the former (Wong, 1975; Wachtershauser, 1988; de Duve, 1991; Danchin, 1989; Di Giulio, 1994b). This perhaps suggests that the ancestral metabolism of amino acids took place on tRNA-like molecules. However, there is no a priori reason why this should have been limited only to amino acid metabolism. Evidently, the coevolution theory would seem to imply that whole primitive metabolism took place on tRNAlike molecule (Di Giulio, 1994b). In actual fact, this generalization of the coevolution theory (Di Giulio, 1994b) has also been suggested following completely different arguments (Tyagi, 1981; Crothers, 1982; Cedergren and Grosjean, 1987; Edwards, 1989; Gibson and Lamond, 1990; Lamond and Gibson, 1990; Szathmary, 1999). Therefore, the mechanism on which the coevolution theory is based might be a manifestation of a much more general bond between ancestral metabolism and tRNA-like molecules (Danchin, 1989). Moreover, this link is assumed to have remained visible in the correspondence between the biosynthetic pathways of amino acids and the structure of the genetic code. And this makes sense as it is expected, more generally, that there should be a close link between the origin of metabolism itself and RNAs, that is, replication. Acknowledgements This work was carried out at the MCD Biology department of the University of Colorado at Boulder and was supported by the NIH and NASA under Grant nos. GM48080 and NCC21052, respectively, given to Dr. M. Yarus. I thanks M. Buvoli for his ‘tautological advices’, I. de Zwart, M. Illangasekare, T. Janas, T. Janas, M. Legiewicz, I. Majerfeld, and M. Yarus for useful discussions and help. References Alff-Steinberger, C., 1969. The genetic code and error transmission. Proc. Natl. Acad. Sci. U.S.A. 64, 584–591. 182 M. Di Giulio / BioSystems 80 (2005) 175–184 Archetti, M., 2004. Codon usage bias and mutations constraints reduce the level of error minimization of the genetic code. J. Mol. Evol. 59, 258–266. Ardell, D.H., 1998. On error minimisation in a sequential origin of the genetic code. J. Mol. Evol. 47, 1–13. Ardell, D.H., Sella, G., 2001. On the evolution of redundancy in genetic code. J. Mol. Evol. 53, 269–281. Ardell, D.H., Sella, G., 2002. No accident: genetic code freeze in error-correcting patterns of the standard genetic code. Phil. Trans. R. Soc. Lond. B Biol. Soc. 357, 1625–1642. Balasubramanian, R., Seetharamulu, P., Raghunathan, G.A., 1980. Conformational rationale for the origin of the mechanism of nucleic acid-directed protein synthesis of ‘living’ organisms. Orig. Life 10, 15–30. Baumann, U., Oro, J., 1993. Three stages in the evolution of the genetic code. BioSystems 29, 133–141. Benner, S.A., Ellington, A.D., Tauer, A., 1989. Modern metabolism as a palimpsest of the RNA world. Proc. Natl. Acad. Sci. U.S.A. 86, 7054–7058. Bermudez, C.I., Daza, E.E., Andrade, E.M., 1999. Characterization and comparison of Escherichia coli transfer RNAs by graph theory based on secondary structure. J. Theor. Biol. 197, 193–205. Black, S., 1973. A theory on the origin of life. Adv. Enzymol. Telat. Areas Mol. Biol. 38, 193–234. Black, S., 1995. Prebiotic 5-substituted uracils and a primitive genetic code. Science 268, 1832. Bock, A., Forchhammer, K., Heider, J., Leinfelder, W., Sawers, G., Veprek, B., Zinoni, F., 1991. Selenocysteine: the 21st amino acid. Mol. Microbiol. 5, 515–520. Brack, A., Orgel, L.E., 1975. Structures of alternating polypeptides and their possible prebiotic significance. Nature 256, 383–387. Cavalier-Smith, T., 2001. Obcells as proto-organism: membrane heredity, lithophosphorilation, and the origin of the genetic code, the first cell, and photosynthesis. J. Mol. Evol. 53, 555–595. Cedergren, R., Grosjean, H., 1987. On the primacy of primordial RNA. BioSystems 20, 175–180. Chaley, M.B., Korotkov, E.V., Skryabin, K.G., 1999. Relationships among isoacceptor tRNAs seems to support the coevolution theory of the origin of the genetic code. J. Mol. Evol. 48, 168–177. Crick, F.H.C., 1968. The origin of the genetic code. J. Mol. Biol. 38, 376–379. Crothers, D.M., 1982. Nucleic acid aggregation geometry and the possible evolutionary origin of the ribosomes and the genetic code. J. Mol. Biol. 162, 379–391. Danchin, A., 1989. Homeotopic transformation and the origin of translation. Prog. Biophys. Mol. Biol. 54, 81–86. de Duve, C., 1991. Blueprint for a Cell: The Nature and Origin of Life. Neil Patterson Publishers, Carolina Biological Supply Company, Burlington, NC. Di Giulio, M., 1989a. Some aspects of the organization and evolution of the genetic code. J. Mol. Evol. 29, 191–201. Di Giulio, M., 1989b. The extension reached by the minimization of polarity distances during the evolution of the genetic code. J. Mol. Evol. 29, 288–293. Di Giulio, M., 1991. On the relationships between the genetic code coevolution hypothesis and the physicochemical hypothesis. Z. Naturforsch. 46C, 305–312. Di Giulio, M., 1992a. The evolution of aminoacyl-tRNA synthetases, the biosynthetic pathways of amino acids and the genetic code. Orig. Life Evol. Biosph. 22, 309–319. Di Giulio, M., 1992b. On the origin of the genetic code. Trends Ecol. Evol. 7, 176–178. Di Giulio, M., 1993. Origin of glutaminyl-tRNA synthetase: an example of palimpsest? J. Mol. Evol. 37, 5–10. Di Giulio, M., 1994a. The phylogeny of tRNA molecules and the origin of the genetic code. Orig. Life Evol. Biosph. 24, 425– 434. Di Giulio, M., 1994b. On the origin of protein synthesis: a speculative model based on hairpin RNA structures. J. Theor. Biol. 171, 303–308. Di Giulio, M., 1995. The phylogeny of tRNAs seems to confirm the predictions of the coevolution theory of the origin of the genetic code. Orig. Life Evol. Biosph. 25, 549–564. Di Giulio, M., 1996. The -sheets of proteins, the biosynthetic relationships between amino acids, and the origin of the genetic code. Orig. Life Evol. Biosph. 26, 589–609. Di Giulio, M., 1997a. On the origin of the genetic code. J. Theor. Biol. 187, 573–581. Di Giulio, M., 1997b. The origin of the genetic code. Trends Biochem. Sci. 22, 49. Di Giulio, M., 1997c. On the RNA world: evidence in favor of an early ribonucleopeptide world. J. Mol. Evol. 45, 571–578. Di Giulio, M., 1998. Reflections on the origin of the genetic code: a hypothesis. J. Theor. Biol. 191, 191–196. Di Giulio, M., 1999. The coevolution theory of the origin of the genetic code. J. Mol. Evol. 48, 253–254. Di Giulio, M., 2000a. Genetic code origin and the strength of natural selection. J. Theor. Biol. 205, 659–661. Di Giulio, M., 2000b. The origin of the genetic code cannot be studied using measurements based on the PAM matrix because this matrix reflects the code itself, making any analyses tautologous. J. Theor. Biol. 208, 141–144. Di Giulio, M., 2000c. The RNA world, the genetic code and the tRNA molecule. Trends Genet. 16, 17–18. Di Giulio, M., 2000d. The late stage of genetic code structuring took place at a high temperature. Gene 261, 189–195. Di Giulio, M., 2001. A blind empiricism against the coevolution theory of the genetic code. J. Mol. Evol. 53, 11–17. Di Giulio, M., 2002. Genetic code origin: are the pathways of the type Glu-tRNAGln → Gln-tRNAGln molecular fossils or not? J Mol. Evol. 55, 616–622. Di Giulio, M., 2003. The early phases of genetic code origin: conjectures on the evolution of coded catalysis. Orig. Life Evol. Biosph. 33, 479–489. Di Giulio, M. A comparison of proteins from Pyrococcus furiosus and Pyrococcus abyssi: barophily in the physicochemical properties of amino acids and in the genetic code. Gene, in press. Di Giulio, M., Medugno, M., 1998. The historical factor: the biosynthetic relationships between amino acids and their physicochemical properties in the origin of the genetic code. J. Mol. Evol. 46, 615–621. Di Giulio, M., Medugno, M., 1999. Physicochemical optimization in the genetic code origin as the number of codified amino acids increases. J. Mol. Evol. 49, 1–10. M. Di Giulio / BioSystems 80 (2005) 175–184 Di Giulio, M., Medugno, M., 2000. The robust statistical bases of the coevolution theory of the genetic code. J. Mol. Evol. 50, 258–263. Di Giulio, M., Medugno, M., 2001. The level and landscape of optimization in the origin of the genetic code. J. Mol. Evol. 52, 372–382. Di Giulio, M., Capobianco, M.R., Medugno, M., 1994. On the optimization of the physicochemical distances between amino acids in the evolution of the genetic code. J. Theor. Biol. 168, 43–51. Dillon, L.S., 1973. The origins of the genetic code. Bot. Rev. 39, 301–345. Dufton, M.J., 1997. Genetic code synonym quotas and amino acid complexity: cutting the cost of proteins? J. Theor. Biol. 187, 165–173. Dunnill, P., 1966. Triplet nucleotide–amino acid pairing, a stereochemical basis for the division between protein and non-protein amino acid. Nature 215, 355–359. Edwards, M.R., 1989. A possible origin of RNA catalysis in multienzyme complexes. Orig. Life Evol. Biosph. 19, 69–72. Edwards, M.R., 1996. Metabolite channeling in the origin of life. J. Theor. Biol. 179, 313–322. Epstein, C.J., 1966. Role of the amino acid code and of selection for conformation in the evolution of proteins. Nature 210, 25–28. Fitch, W.M., Upper, K., 1987. The phylogeny of tRNA sequences provides evidence for ambiguity reduction in the origin of the genetic code. Cold Spring Harbor Symp. Quant. Biol. 52, 759–767. Freeland, S.J., Hurst, L.D., 1998a. The genetic code is one in a million. J. Mol. Evol. 47, 238–248. Freeland, S.J., Hurst, L.D., 1998b. Load minimisation of the genetic code: history does not explain the pattern. Proc. R. Soc. Lond. B 265, 2111–2119. Freeland, S.J., Knight, R.D., Landweber, L.F., 2000a. Measuring adaptation within the genetic code. Trends Biochem. Sci. 25, 44–45. Freeland, S.J., Knight, R.D., Landweber, L.F., Hurst, L.D., 2000b. Early fixation of an optimal genetic code. Mol. Biol. Evol. 7, 511–518. Freeland, S.J., Wu, T., Keulmann, N., 2003. The case for a error minimizing standard genetic code. Orig. Life Evol. Biosph. 33, 457–477. Gamow, G., 1954. Possible relation between deoxyribonucleic acid and protein structures. Nature 173, 318. Gibson, T.J., Lamond, A.I., 1990. Metabolic complexity in the RNA world and implications for the origin of protein synthesis. J. Mol. Evol. 30, 7–15. Goldberg, A.L., Wittes, R.E., 1966. Genetic code: aspects of organization. Science 153, 420–424. Goldman, N., 1993. Further results on error minimization in the genetic code. J. Mol. Evol. 37, 662–664. Haig, D., Hurst, L.D., 1991. A quantitative measure of error minimization in the genetic code. J. Mol. Evol. 33, 412–417. Hartman, H., 1995. Speculations on the origin of the genetic code. J. Mol. Evol. 40, 541–544. Hasegawa, M., Miyata, T., 1980. On the antisymmetry of the amino acid code table. Orig. Life 10, 265–270. Hendry, L.B., Bransome Jr., E.D., Hutson, M.S., Campbell, L.K., 1981. First approximation of a stereochemical rationale for the genetic code based on the topography and physicochemical prop- 183 erties of “cavities” constructed from models of DNA. Proc. Natl. Acad. Sci. U.S.A. 78, 7440–7444. Hopfield, J.J., 1978. Origin of the genetic code: a testable hypothesis based on tRNA structure, sequence, and kinetic proofreading. Proc. Natl. Acad. Sci. U.S.A. 75, 4334–4338. Ibba, M., Curnow, A.W., Soll, D., 1997. Aminoacyl-tRNA synthesis: divergent routes to a common goal. Trends Biochem. Sci. 22, 39–42. Ibba, M., Soll, D., 2002. Genetic code: introducing pyrrolysine. Curr. Biol. 12, R464–R466. Jukes, T.H., 1966. Molecules and Evolution. Columbia University Press, New York. Jukes, T.H., 1978. The genetic code. Adv. Enzymol. 47, 375–432. Jungck, J.R., 1978. The genetic code as a periodic table. J. Mol. Evol. 11, 211–224. Jurka, J., Kolosza, Z., Roterman, I., 1982. Globular proteins, GU wobbling, and the evolution of the genetic code. J. Mol. Evol. 19, 20–27. Jurka, J., Smith, T.F., 1987a. -Turn-driven early evolution: the genetic code and biosynthetic pathways. J. Mol. Evol. 25, 15–19. Jurka, J., Smith, T.F., 1987b. -Turns in early evolution: chirality, genetic code, and biosynthetic pathways. Cold Spring Harbor Symp. Quant. Biol. 52, 407–410. Klipcan, L., Safro, M., 2004. Amino acid biogenesis, evolution of the genetic code and aminoacyl-tRNA synthetases. J. Theor. Biol. 228, 389–396. Knight, R.D., Freeland, S.J., Landweber, L.F., 1999. Selection, history and chemistry: the three faces of the genetic code. Trends Biochem. Sci. 24, 241–247. Lacey Jr., J.C., Mullins Jr., D.W., 1983. Experimental studies related to the origin of the genetic code and the process of protein synthesis—a review. Orig. Life 13, 3–42. Lacey Jr., J.C., Wickramasinghe, N.S.M.D., Cook, G.W., 1992. Experimental studies on the origin of the genetic code and the process of protein synthesis: a review update. Orig. Life Evol. Biosph. 22, 243–275. Lacey Jr., J.C., Wickramasinghe, N.S.M.D., Cook, G.W., Anderson, G., 1993. Couplings of character and of chirality in the origin of the genetic system. J. Mol. Evol. 37, 233–239. Lamond, A.I., Gibson, T.J., 1990. Catalytic RNA and the origin of genetic systems. Trends Genet. 6, 145–149. Larkin, D.C., Martins, S.A., Roberts, D.J., Fox, G.E., 2001. AlaHis mediated peptide bond formation revisited. Orig. Life Evol. Biosph. 31, 511–526. Lozupone, C., Changayil, S., Majerfeld, I., Yarus, M., 2003. Selection of the simplest RNA that binds isoleucine. RNA 9, 1315–1322. Marlborough, D.I., 1980. Early assignments of the genetic code dependent upon protein structure. Orig. Life 10, 3–14. McClendon, J.H., 1986. The relationship between the origins of the biosynthetic paths to the amino acids and their coding. Orig. Life 16, 260–270. McClendon, J.H., 1987. The relationship between the biosynthetic paths to the amino acids and their coding. I: The aliphatic amino acids and proline. Orig. Life 17, 401–417. Melcher, G., 1974. Stereospecificity of the genetic code. J. Mol. Evol. 3, 121–141. 184 M. Di Giulio / BioSystems 80 (2005) 175–184 Miseta, A., 1989. The role of protein associated amino acid precursor molecules in the organization of genetic codons. Physiol. Chem. Phys. Med. NMR 21, 237–242. Morowitz, H.J., 1992. Beginnings of Cellular Life: Metabolism Recapitulates Biogenesis. Yale University Press/Vail-Ballou Press, Binghamton/New York, pp. 160–171. Nagyvary, J., Fendler, J.H., 1974. Origin of the genetic code: a physical–chemical model of primitive codon assignments. Orig. Life 5, 357–362. Nirenberg, M.W., Jones, O.W., Leder, P., Clark, B.F.C., Sly, W.S., Pestka, S., 1963. On the coding of genetic information. Cold Spring Harbor Symp. Quant. Biol. 28, 549–557. Nelsestuen, G.L., 1978. Amino acid-directed nucleic acid synthesis: a possible mechanism in the origin of life. J. Mol. Evol. 11, 109–120. Pelc, S.R., 1965. Correlation between coding-triplets and aminoacids. Nature (London) 207, 597–599. Pelc, S.R., Welton, M.G.E., 1966. Stereochemical relationship between coding triplets and amino-acids. Nature (London) 209, 868–870. Poole, A.M., Jeffares, D.C., Penny, D., 1998. The path from the RNA world. J. Mol. Evol. 46, 1–17. Sella, G., Ardell, D.H., 2002. The impact of message on the fitness of a genetic code. J. Mol. Evol. 54, 638–651. Shimizu, M., 1982. Molecular basis for the genetic code. J. Mol. Evol. 18, 297–303. Shimizu, M., 1995. Specific aminoacylation of C4N hairpin RNAs with the cognate aminoacyl-adenylates in the presence of a dipeptide: origin of the genetic code. J. Biochem. 117, 23–26. Siemion, I.Z., Stefanowicz, P., 1992. Periodical changes of amino acid reactivity within the genetic code. BioSystems 27, 77– 84. Sjostrom, M., Wold, S., 1985. A multivariate study of the relationship between the genetic code and the physico-chemical properties of amino acids. J. Mol. Evol. 22, 272–277. Sonneborn, T.M., 1965. Degeneracy of the genetic code, extent, nature, and genetic implications. In: Bryson, V., Vogel, H.J. (Eds.), Evolving Genes and Proteins. Academic Press, New York, pp. 377–397. Swanson, R., 1984. A unifying concept for the amino acid code. Bull. Math. Biol. 46, 187–203. Szathmary, E., 1993. Coding coenzyme handles: a hypothesis for the origin of the genetic code. Proc. Natl. Acad. Sci. U.S.A. 90, 9916–9920. Szathmary, E., 1999. The origin of the genetic code: amino acids as cofactors in an RNA world. Trends Genet. 15, 223–229. Szathmary, E., Zintzaras, E., 1992. A statistical test of hypotheses on the organization and evolution of the genetic code. J. Mol. Evol. 35, 185–189. Taylor, F.J.R., Coates, D., 1989. The code within the codons. BioSystems 22, 177–187. Tyagi, S., 1981. Origin of translation: the hypothesis of permanently attached adaptors. Orig. Life 11, 343–351. Tumbula, D.L., Becker, H.D., Chang, W.Z., Soll, D., 2000. Domainspecific recruitment of amide amino acids for protein synthesis. Nature 407, 106–110. Volkenstein, M.V., 1966. The genetic code of protein structure. Biochem. Biophys. Acta 119, 421–424. Wachtershauser, G., 1988. Before enzymes and templates: theory of surface metabolism. Microbiol. Rev. 52, 452–484. Weber, A.L., Lacey Jr., J.C., 1978. Genetic code correlations: amino acids and their anticodon nucleotides. J. Mol. Evol. 11, 199– 210. Welton, M.G.E., Pelc, S.R., 1966. Specificity of the stereochemical relationship between ribonucleic acid-triplets and amino acids. Nature 209, 870–872. Wetzel, R., 1978. Aminoacyl-tRNA synthetase families and their significance to the origin of the genetic code. Orig. Life 9, 39–50. Woese, C.R., 1965. On the origin of the genetic code. Proc. Natl. Acad. Sci. U.S.A. 54, 1546–1552. Woese, C.R., 1967. The Genetic Code. Harper & Row, New York. Woese, C.R., Dugre, D.H., Dugre, S.A., Kondo, M., Saxinger, W.C., 1966. On the fundamental nature and evolution of the genetic code. Cold Spring Harbor Symp. Quant. Biol. 31, 723–736. Wolfenden, R.V., Cullis, P.M., Southgate, C.C.F., 1979. Water, protein folding, and the genetic code. Science 206, 575–577. Wong, J.T., 1975. A co-evolution theory of the genetic code. Proc. Natl. Acad. Sci. U.S.A. 72, 1909–1912. Wong, J.T., 1976. The evolution of a universal genetic code. Proc. Natl. Acad. Sci. U.S.A. 73, 2336–2340. Wong, J.T., 1980. Role of minimization of chemical distances between amino acids in the evolution of the genetic code. Proc. Natl. Acad. Sci. U.S.A. 77, 1083–1086. Wong, J.T., 1981. Coevolution of the genetic code and amino acid biosynthesis. Trends Biochem. Sci. 6, 33–36. Wong, J.T., 1988. Evolution of the genetic code. Microbiol. Sci. 5, 174–181. Yarus, M., 1988. A specific amino acid binding site composed of RNA. Science 240, 1751–1758. Yarus, M., 1991. An RNA–amino acid complex and the origin of the genetic code. New Biologist 3, 183–189. Yarus, M., 1993. An RNA-amino acid affinity. In: Gesteland, R.F., Atkins, J.F. (Eds.), The RNA World. Cold Spring Harbor Laboratory Press, Plainview, NY, pp. 205–217. Yarus, M., 1998. Amino acids as RNA ligands: a direct-RNAtemplate theory for the genetic code’s origin. J. Mol. Evol. 47, 109–117. Yarus, M., 2000. RNA-ligand chemistry: a testable source for the genetic code. RNA 6, 475–484. Yarus, M., 2002. Primordial genetics: phenotype of the ribocyte. Ann. Rev. Genet. 36, 121–151. Yarus, M., Christian, E., 1989. Genetic code origins. Nature 342, 349–350. Zhu, C.T., Zeng, X.B., Huang, W.D., 2003. Codon usage decreases the error minimization within the genetic code. J. Mol. Evol. 57, 533–537.