Conserved introns reveal novel transcripts in Drosophila melanogaster

  1. Michael Hiller1,2,9,
  2. Sven Findeiß3,4,
  3. Sandro Lein5,
  4. Manja Marz3,
  5. Claudia Nickel5,
  6. Dominic Rose3,
  7. Christine Schulz6,
  8. Rolf Backofen1,
  9. Sonja J. Prohaska3,4,7,
  10. Gunter Reuter5 and
  11. Peter F. Stadler3,4,6,7,8
  1. 1 Bioinformatics Group, Albert-Ludwigs-University Freiburg, 79110 Freiburg, Germany;
  2. 2 Department of Developmental Biology, Stanford University, Stanford, California 94305, USA;
  3. 3 Bioinformatics Group, Department of Computer Science, University of Leipzig, D-04107 Leipzig, Germany;
  4. 4 Interdisciplinary Center of Bioinformatics, University of Leipzig, D-04107 Leipzig, Germany;
  5. 5 Institute of Genetics, Biologicum, Martin Luther University Halle-Wittenberg, D-06108 Halle, Germany;
  6. 6 RNomics Group, Fraunhofer Institut für Zelltherapie und Immunologie–IZI, D-04103 Leipzig, Germany;
  7. 7 Institut für Theoretische Chemie und Molekulare Strukturbiologie, Universität Wien, A-1090 Wien, Austria;
  8. 8 Sante Fe Institute, Santa Fe, New Mexico 87501, USA

    Abstract

    Noncoding RNAs that are—like mRNAs—spliced, capped, and polyadenylated have important functions in cellular processes. The inventory of these mRNA-like noncoding RNAs (mlncRNAs), however, is incomplete even in well-studied organisms, and so far, no computational methods exist to predict such RNAs from genomic sequences only. The subclass of these transcripts that is evolutionarily conserved usually has conserved intron positions. We demonstrate here that a genome-wide comparative genomics approach searching for short conserved introns is capable of identifying conserved transcripts with a high specificity. Our approach requires neither an open reading frame nor substantial sequence or secondary structure conservation in the surrounding exons. Thus it identifies spliced transcripts in an unbiased way. After applying our approach to insect genomes, we predict 369 introns outside annotated coding transcripts, of which 131 are confirmed by expressed sequence tags (ESTs) and/or noncoding FlyBase transcripts. Of the remaining 238 novel introns, about half are associated with protein-coding genes—either extending coding or untranslated regions or likely belonging to unannotated coding genes. The remaining 129 introns belong to novel mlncRNAs that are largely unstructured. Using RT-PCR, we verified seven of 12 tested introns in novel mlncRNAs and 11 of 17 introns in novel coding genes. The expression level of all verified mlncRNA transcripts is low but varies during development, which suggests regulation. As conserved introns indicate both purifying selection on the exon–intron structure and conserved expression of the transcript in related species, the novel mlncRNAs are good candidates for functional transcripts.

    Footnotes

    • 9 Corresponding author.

      E-mail hillerm{at}stanford.edu; fax (650) 724-3621.

    • [Supplemental material is available online at www.genome.org. Partial sequences of experimentally confirmed novel transcripts have been deposited in GenBank (http://www.ncbi.nlm.nih.gov/Genbank/) under accession nos. FJ528666–FJ528673 and FJ845365–FJ845382.]

    • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.090050.108.

      • Received December 8, 2008.
      • Accepted March 18, 2009.
    | Table of Contents

    Preprint Server