Suppr超能文献

使用长读长RNA测序对证据驱动的基因组注释策略进行评估。

Evaluation of strategies for evidence-driven genome annotation using long-read RNA-seq.

作者信息

Paniagua Alejandro, Agustín-García Cristina, Pardo-Palacios Francisco J, Brown Thomas, De Maria Maite, Denslow Nancy D, Mazzoni Camila J, Conesa Ana

机构信息

Institute for Integrative Systems Biology, Spanish National Research Council, Paterna 46980, Spain.

Department of Computer Science, Universitat de València, Valencia 46100, Spain.

出版信息

Genome Res. 2025 Apr 14;35(4):1053-1064. doi: 10.1101/gr.279864.124.

Abstract

While the production of a draft genome has become more accessible due to long-read sequencing, the annotation of these new genomes has not been developed at the same pace. Long-read RNA sequencing offers a promising solution for enhancing gene annotation. In this study, we explore how sequencing platforms, Oxford Nanopore R9.4.1 chemistry or Pacific Biosciences (PacBio) Sequel II CCS, and data processing methods influence evidence-driven genome annotation using long reads. Incorporating PacBio transcripts into our annotation pipeline significantly outperformed traditional methods, such as ab initio predictions and short-read-based annotations. We applied this strategy to a nonmodel species, the Florida manatee, and compared our results to existing short-read-based annotation. At the loci level, both annotations were highly concordant, with 90% agreement. However, at the transcript level, the agreement was only 35%. We identified 4906 novel loci, represented by 5707 isoforms, with 64% of these isoforms matching known sequences in other mammalian species. Overall, our findings underscore the importance of using high-quality curated transcript models in combination with ab initio methods for effective genome annotation.

摘要

虽然由于长读长测序,基因组草图的绘制变得更加容易,但这些新基因组的注释工作却没有跟上同样的步伐。长读长RNA测序为增强基因注释提供了一个很有前景的解决方案。在本研究中,我们探讨了测序平台(牛津纳米孔R9.4.1化学技术或太平洋生物科学公司(PacBio)的Sequel II CCS)以及数据处理方法如何影响使用长读长进行的证据驱动的基因组注释。将PacBio转录本纳入我们的注释流程显著优于传统方法,如从头预测和基于短读长的注释。我们将此策略应用于一个非模式物种——佛罗里达海牛,并将我们的结果与现有的基于短读长的注释进行比较。在基因座水平上,两种注释高度一致,一致性达90%。然而,在转录本水平上,一致性仅为35%。我们鉴定出4906个新基因座,由5707个异构体代表,其中64%的异构体与其他哺乳动物物种中的已知序列匹配。总体而言,我们的研究结果强调了结合使用高质量的经过整理的转录本模型和从头方法进行有效基因组注释的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d183/12047274/4d204e786763/1053f01.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验