Compilation of tRNA sequences and sequences of tRNA genes

September 2007 edition

Mathias Sprinzl*1, Konstantin S. Vassilenko2

 

1 Laboratorium für Biochemie, Universität Bayreuth, 95440 Bayreuth, Germany and 2 Institute of Protein Research, Russian Academy of Sciences, 142290 Puschchino, Moscow Region, Russia

 

*To whom correspondence should be addressed

Tel.: +49 921 552420

Fax: +49 921 552432

email:

mathias.sprinzl@uni-bayreuth.de;

kvassil@vega.protres.ru

 

INTRODUCTION

The new compilation of tRNA Sequences and Sequences of tRNA genes contains in addition to 3279 sequences of the last edition from 1998 (1) the completely new Genomic tRNA Compilation including the sequences of tRNA genes from complete genomes published up to september 2004. The current Database consists of three parts:

1. Genomic tRNA Compilation (MS Excel® file, ZIPed)

2. Compilation of tRNA Sequences (MS Excel® file, ZIPed)

3. Compilation of tRNA Genes (MS Excel® file, ZIPed)

 

Genomic tRNA Compilation,

is the compilation of the sequences of cytoplasmic tRNA genes derived from sequences of complete genomes included into DNA databases. Since sequences of tRNA genes originating from cellular organelles (e.g. mitochondria of mammalian cells) frequently can not be processed to the general cloverleaf scheme, they were not included in the Genomic tRNA Compilation. There are specialised databases dealing with these sequences (see links below).

Current Genomic tRNA Compilation consists of about 7600 tRNA gene sequences from 131 organisms covering archaea, bacteria, higher and lower eukarya (this Compilation was last time updated in 2004). The database includes the tRNA genes sequences collected in GtRDB (2) as well as those from the additional complete genomes found in DNA databases. tRNA genes were identified by sequencing teams using common tRNA search programs [a.g. tRNAScan (2)]. If the genomes of the different strains of the same organism were sequenced, the corresponding tRNA genes were added to the database independently.

Compilation of tRNA Sequences,

is a summary of tRNA sequences, including modified bases and references of the corresponding publications. The references are restricted to the first publication of the complete sequence unless additional information (e.g. base modification, corrections, etc.) was later obtained. In such cases additional references were added. This compilation is updated up to September 2007. The table contains the known tRNA sequences of all organisms including organells. This is the continuation of the original tRNA compilation first published in 1978.

Compilation of tRNA Genes,

is a summary of the published sequences of tRNA genes, which were sequenced individually, not as a part of the whole genome. It contains tRNA gene sequences of all organisms and organels. This table contains about 350 sequences of cytoplasmic tRNA genes that are not included in the Genomic tRNA Database. Most of the tRNA gene entries in this table have references of the publications in which the sequence was communicated.

 

PRESENTATION OF SEQUENCES

Sequences are presented as MS Excel® workbooks. All the information collected is split into different indexed tables according to the type of data (specificity, sequence, organism, etc.) and the descriptions of certain genes are summarised in the main worksheet that includes the relations between the data tables. The information can be obtained by filling the query form that allows to enter the simple search criteria and to select the type of data to be displayed. The result of search is presented as a table containing the description of the genes found. This includes unique id, amino acid specificity, anticodon sequence, organism name, literature reference, sequence, basepairing and additional comments. The Genomic tRNA Compilation contains additional information about taxonomy, strain, original database source and position of the gene in genome.

An alignment of sequences is used, which is most compatible with the tRNA phylogeny and known three-dimensional structures of tRNA (3, 4). The corresponding numbering system is shown in Figure 1. Positions in particular sequence which are not filled (gaps in the generalised structure) are indicated by a dash. All nucleotide insertions are commented and denoted by underlining at the place of insertion.

This compilations use a one-letter code for all nucleotides including modified ones. For standard nucleotides, adenosine, cytidine, guanosine, thymidine and uridine the usual abbreviations, A, C, G, T and U, respectively, are used. To designate modified nucleotides, the other ASCII signs are employed (see sheet "Help" in the corresponding MS Excel® file). Terminology and structure of the modified nucleosides occurring in tRNAs were used according to (5) and (6).

Each sequence in the Compilation of tRNA Sequences and Compilation of tRNA Genes has unique six-position identification code of the sequence ('D' or 'R' for DNA or RNA, respectively; a one-letter code for the amino acid, X for methionine-initiator, Z for selenocysteine; the three-digit code specifying the organism and one digit for isoacceptor number). Nucleotides involved in Watson-Crick pairs are marked with '=', the GU pairs are indicated with the sign '*', tertiary interactions are not annotated.

In addition to the plain text table one can explore the result of search by presenting the sequences in a cloverleaf form (Figure 1). It is possible to scroll the found sequences one by one or to select directly the sequence of interest from the result table. The presentation supports colour code for different structural features in the canonical cloverleaf model.

Simple statistical information on the occurrences of certain bases at given positions and the preferences in basepairing also can be obtained on a special data sheet.

 

Useful links:

The RNA Modification Database
http://medlib.med.utah.edu/RNAmods

A database for plant mitochondrial tRNA genes and molecules
http://www.ba.itb.cnr.it/PLMItRNA/

Compilation of mammaliam mitochondrial tRNA genes
http://mamit-trna.u-strasbg.fr

GtRDB: The Genomic tRNA Database
http://gtrnadb.ucsc.edu/

 

ACKNOWLEDGEMENT

            This project was supported by Fonds der Chemischen Industrie and Universität Bayreuth. We are gratefull for advise, cooperation and help with data collection to Todd Michael Johnson Lowe, Genetics, Staanford University, California, Dr. Carlos Hoyo-Vadillo, Departamento de Farmacologia y Toxicologia, Cinvestav, Mexico City, Mexico, Yvonne Baberowski, Mark Dürr and Bernhard Thielen, Universität Bayreuth.

 

REFERENCES

 

1.      Sprinzl M., Horn C., Brown M., Ioudovitch A. and Steinberg S. (1998) Nucl. Acids Res. 26, 148-153.

2.   Rainaldi, G., Volpicella, M., Licciulli, F., Liuni, S., Gallerani, R. and Ceci, R. (2003) Nucl. Acids Res. 31, 436-438.

3.   Helm, M., Brule, H., Friede, D., Giege, R., Putz, D. and Florentz, C. (2000) RNA. 6, 1356-1379.

4.      Lowe, T.M. & Eddy, S.R. (1997) Nucl. Acids Res. 25, 955-964.

5.      P.R. Schimmmel, D. Söll, J.N. Abelson Eds. (1979) Transfer-RNA: Structure, properties and recognition, Cold Spring Harbor Laboratory, p.518-519.

6.      Steinberg S.V. and Kisselev L.L. (1992) Biochimie 74, 337-351.

7.      Limbach P.A., Crain P.F. and McCloskey, J.A. (1994) Nucl. Acids Res. 22, 2183-2196.

8.      Crain P.F. and McCloskey J.A. (1997) Nucl. Acids Res. 25, 126-127.

 

 

tRNA database searching engine
Internet service that allows to find records in the database according to multiple search criteria. Complicated sequence-based queries can be formed (Updated for the data in Compilation of tRNA Genes and Compilation of tRNA Sequences up to the end of 1998).

 

tRNA-Editor
Researchers who wish to perform an advanced search for tRNA sequences according to several criteria, e.g. anticodon, amino acid specificity, modified nucleoside, or wish to print the requested sequences in the cloverleaf form can download appropriate Windows 3.1 based software as a 900kB ZIPed file (Updated for the data in Compilation of tRNA Genes and Compilation of tRNA Sequences up to the end of 1998).