Suppr超能文献

TransIntegrator:通过整合 Illumina 和 PacBio 转录组,捕获几乎完整的蛋白质编码转录变体。

TransIntegrator: capture nearly full protein-coding transcript variants via integrating Illumina and PacBio transcriptomes.

机构信息

State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, 361102, Xiamen, China.

National Institute for Data Science in Health and Medicine, Xiamen University, 361102, Xiamen, China.

出版信息

Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad334.

Abstract

Genes have the ability to produce transcript variants that perform specific cellular functions. However, accurately detecting all transcript variants remains a long-standing challenge, especially when working with poorly annotated genomes or without a known genome. To address this issue, we have developed a new computational method, TransIntegrator, which enables transcriptome-wide detection of novel transcript variants. For this, we determined 10 Illumina sequencing transcriptomes and a PacBio full-length transcriptome for consecutive embryo development stages of amphioxus, a species of great evolutionary importance. Based on the transcriptomes, we employed TransIntegrator to create a comprehensive transcript variant library, namely iTranscriptome. The resulting iTrancriptome contained 91 915 distinct transcript variants, with an average of 2.4 variants per gene. This substantially improved current amphioxus genome annotation by expanding the number of genes from 21 954 to 38 777. Further analysis manifested that the gene expansion was largely ascribed to integration of multiple Illumina datasets instead of involving the PacBio data. Moreover, we demonstrated an example application of TransIntegrator, via generating iTrancriptome, in aiding accurate transcriptome assembly, which significantly outperformed other hybrid methods such as IDP-denovo and Trinity. For user convenience, we have deposited the source codes of TransIntegrator on GitHub as well as a conda package in Anaconda. In summary, this study proposes an affordable but efficient method for reliable transcriptomic research in most species.

摘要

基因具有产生执行特定细胞功能的转录变体的能力。然而,准确检测所有转录变体仍然是一个长期存在的挑战,尤其是在处理注释不良的基因组或没有已知基因组的情况下。为了解决这个问题,我们开发了一种新的计算方法 TransIntegrator,它能够在全转录组范围内检测新的转录变体。为此,我们确定了 10 个 Illumina 测序转录组和一个 PacBio 全长转录组,用于研究文昌鱼连续胚胎发育阶段,文昌鱼是一种具有重要进化意义的物种。基于转录组,我们使用 TransIntegrator 创建了一个全面的转录变体文库,即 iTranscriptome。生成的 iTranscriptome 包含 91,915 个独特的转录变体,每个基因的平均转录变体数为 2.4 个。这大大改进了当前的文昌鱼基因组注释,将基因数量从 21,954 个扩展到 38,777 个。进一步的分析表明,基因扩展主要归因于多个 Illumina 数据集的整合,而不是涉及 PacBio 数据。此外,我们通过生成 iTranscriptome 展示了 TransIntegrator 的一个应用示例,即帮助准确的转录组组装,这明显优于其他混合方法,如 IDP-denovo 和 Trinity。为了方便用户,我们将 TransIntegrator 的源代码存储在 GitHub 上,并在 Anaconda 中创建了一个 conda 包。总之,本研究提出了一种经济高效的方法,可用于大多数物种中可靠的转录组学研究。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验