Suppr超能文献

ClusTrast:一种基于聚类 contigs 的短读从头转录本异构体组装工具。

ClusTrast: a short read de novo transcript isoform assembler guided by clustered contigs.

机构信息

Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, 171 65, Solna, Sweden.

Department of Medicine Huddinge, Center for Hematology and Regenerative Medicine (HERM), Karolinska Institute, 141 52, Flemingsberg, Sweden.

出版信息

BMC Bioinformatics. 2024 Feb 1;25(1):54. doi: 10.1186/s12859-024-05663-3.

Abstract

BACKGROUND

Transcriptome assembly from RNA-sequencing data in species without a reliable reference genome has to be performed de novo, but studies have shown that de novo methods often have inadequate ability to reconstruct transcript isoforms. We address this issue by constructing an assembly pipeline whose main purpose is to produce a comprehensive set of transcript isoforms.

RESULTS

We present the de novo transcript isoform assembler ClusTrast, which takes short read RNA-seq data as input, assembles a primary assembly, clusters a set of guiding contigs, aligns the short reads to the guiding contigs, assembles each clustered set of short reads individually, and merges the primary and clusterwise assemblies into the final assembly. We tested ClusTrast on real datasets from six eukaryotic species, and showed that ClusTrast reconstructed more expressed known isoforms than any of the other tested de novo assemblers, at a moderate reduction in precision. For recall, ClusTrast was on top in the lower end of expression levels (<15% percentile) for all tested datasets, and over the entire range for almost all datasets. Reference transcripts were often (35-69% for the six datasets) reconstructed to at least 95% of their length by ClusTrast, and more than half of reference transcripts (58-81%) were reconstructed with contigs that exhibited polymorphism, measuring on a subset of reliably predicted contigs. ClusTrast recall increased when using a union of assembled transcripts from more than one assembly tool as primary assembly.

CONCLUSION

We suggest that ClusTrast can be a useful tool for studying isoforms in species without a reliable reference genome, in particular when the goal is to produce a comprehensive transcriptome set with polymorphic variants.

摘要

背景

在没有可靠参考基因组的物种中,从 RNA 测序数据进行转录组组装必须从头开始,但研究表明,从头开始的方法通常无法充分重建转录本异构体。我们通过构建一个组装管道来解决这个问题,该管道的主要目的是生成一套全面的转录本异构体。

结果

我们提出了从头转录本异构体组装器 ClusTrast,它接受短读 RNA-seq 数据作为输入,组装一个初级组装,聚类一组引导 contigs,将短读与引导 contigs 对齐,分别组装每个聚类的短读,并将初级组装和聚类组装合并为最终组装。我们在六个真核生物物种的真实数据集上测试了 ClusTrast,并表明 ClusTrast 比任何其他测试的从头组装器重建了更多表达的已知异构体,而精度略有降低。对于召回率,ClusTrast 在所有测试数据集的表达水平较低端(<15%)均处于领先地位,并且几乎在所有数据集上都处于整个范围内。参考转录物经常(六个数据集的 35-69%)由 ClusTrast 重建到至少 95%的长度,并且超过一半的参考转录物(58-81%)由具有多态性的 contigs 重建,这些 contigs 是在一组可靠预测的 contigs 子集上测量的。当使用来自多个组装工具的组装转录本的并集作为初级组装时,ClusTrast 的召回率会增加。

结论

我们建议 ClusTrast 可以成为研究无可靠参考基因组物种中异构体的有用工具,特别是当目标是生成具有多态性变体的全面转录组集时。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6704/10836024/3306324f1f2f/12859_2024_5663_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验