Magallanes-Alba Melisa Eliana, Baricalla Agustín, Rego Natalia, Brun Antonio, Karasov William H, Caviedes-Vidal Enrique
Instituto Multidisciplinario de Investigaciones Biológicas (IMIBIO-SL), Consejo Nacional de Investigaciones Científicas y Técnicas, San Luis, San Luis 5700, Argentina.
Department of Forest and Wildlife Ecology, University of Wisconsin-Madison, Madison, WI 53706, USA.
Biol Methods Protoc. 2023 Jul 7;8(1):bpad013. doi: 10.1093/biomethods/bpad013. eCollection 2023.
The house sparrow () is a valuable avian model for studying evolutionary genetics, development, neurobiology, physiology, behavior, and ecology, both in laboratory and field-based settings. The current annotation of the genome available at the Ensembl Rapid Release site is primarily focused on gene set building and lacks functional information. In this study, we present the first comprehensive functional reannotation of the genome using intestinal Illumina RNA sequencing (RNA-Seq) libraries. Our revised annotation provides an expanded view of the genome, encompassing 38592 transcripts compared to the current 23574 transcripts in Ensembl. We also predicted 14717 protein-coding genes, achieving 96.4% completeness for Passeriformes lineage BUSCOs. A substantial improvement in this reannotation is the accurate delineation of untranslated region (UTR) sequences. We identified 82.7% and 93.8% of the transcripts containing 5'- and 3'-UTRs, respectively. These UTR annotations are crucial for understanding post-transcriptional regulatory processes. Our findings underscore the advantages of incorporating additional specific RNA-Seq data into genome annotation, particularly when leveraging fast and efficient data processing capabilities. This functional reannotation enhances our understanding of the genome, providing valuable resources for future investigations in various research fields.
家麻雀(Passer domesticus)是一种在实验室和野外环境中研究进化遗传学、发育、神经生物学、生理学、行为和生态学的重要鸟类模型。Ensembl快速发布网站上当前可用的家麻雀基因组注释主要集中在基因集构建上,缺乏功能信息。在本研究中,我们使用肠道Illumina RNA测序(RNA-Seq)文库首次对家麻雀基因组进行了全面的功能重新注释。我们修订后的注释提供了对基因组更广泛的认识,与Ensembl中当前的23574个转录本相比,包含38592个转录本。我们还预测了14717个蛋白质编码基因,雀形目谱系BUSCOs的完整性达到96.4%。这次重新注释的一个显著改进是对非翻译区(UTR)序列的准确划分。我们分别鉴定出82.7%和93.8%的转录本含有5'-UTR和3'-UTR。这些UTR注释对于理解转录后调控过程至关重要。我们的研究结果强调了将额外的特定RNA-Seq数据纳入基因组注释的优势,特别是在利用快速高效的数据处理能力时。这种功能重新注释增强了我们对家麻雀基因组的理解,为未来各个研究领域的调查提供了宝贵的资源。