Suppr超能文献

Compacta:一种用于从头组装转录组的快速重叠群聚类工具。

Compacta: a fast contig clustering tool for de novo assembled transcriptomes.

机构信息

Departamento de Investigaciones Científicas y Tecnológicas de la Universidad de Sonora, Universidad de Sonora, Hermosillo, Mexico.

Unidad de Genómica Avanzada (Langebio), Centro de Investigacíon y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav), Irapuato, Gto, Mexico.

出版信息

BMC Genomics. 2020 Feb 11;21(1):148. doi: 10.1186/s12864-020-6528-x.

Abstract

BACKGROUND

RNA-Seq is the preferred method to explore transcriptomes and to estimate differential gene expression. When an organism has a well-characterized and annotated genome, reads obtained from RNA-Seq experiments can be directly mapped to that genome to estimate the number of transcripts present and relative expression levels of these transcripts. However, for unknown genomes, de novo assembly of RNA-Seq reads must be performed to generate a set of contigs that represents the transcriptome. These contig sets contain multiple transcripts, including immature mRNAs, spliced transcripts and allele variants, as well as products of close paralogs or gene families that can be difficult to distinguish. Thus, tools are needed to select a set of less redundant contigs to represent the transcriptome for downstream analyses. Here we describe the development of Compacta to produce contig sets from de novo assemblies.

RESULTS

Compacta is a fast and flexible computational tool that allows selection of a representative set of contigs from de novo assemblies. Using a graph-based algorithm, Compacta groups contigs into clusters based on the proportion of shared reads. The user can determine the minimum coverage of the contigs to be clustered, as well as a threshold for the proportion of shared reads in the clustered contigs, thus providing a dynamic range of transcriptome compression that can be adapted according to experimental aims. We compared the performance of Compacta against state of the art clustering algorithms on assemblies from Arabidopsis, mouse and mango, and found that Compacta yielded more rapid results and had competitive precision and recall ratios. We describe and demonstrate a pipeline to tailor Compacta parameters to specific experimental aims.

CONCLUSIONS

Compacta is a fast and flexible algorithm for the determination of optimum contig sets that represent the transcriptome for downstream analyses.

摘要

背景

RNA-Seq 是探索转录组和估计差异基因表达的首选方法。当一个生物体具有特征明确且注释良好的基因组时,可以将从 RNA-Seq 实验中获得的读取直接映射到该基因组,以估计存在的转录本数量和这些转录本的相对表达水平。然而,对于未知的基因组,必须执行 RNA-Seq 读取的从头组装,以生成一组代表转录组的 contigs。这些 contig 集包含多个转录本,包括不成熟的 mRNA、拼接的转录本和等位基因变体,以及紧密同源或基因家族的产物,这些产物很难区分。因此,需要工具来选择一组较少冗余的 contigs 来代表转录组进行下游分析。在这里,我们描述了 Compacta 的开发,以从从头组装中生成 contig 集。

结果

Compacta 是一种快速灵活的计算工具,允许从从头组装中选择一组代表性的 contigs。使用基于图的算法,Compacta 根据共享读取的比例将 contigs 分组到簇中。用户可以确定要聚类的 contigs 的最小覆盖度,以及在聚类 contigs 中共享读取的比例的阈值,从而提供了一个可以根据实验目标进行调整的转录组压缩的动态范围。我们将 Compacta 的性能与来自拟南芥、小鼠和芒果的组装的最先进的聚类算法进行了比较,发现 Compacta 产生了更快速的结果,并且具有有竞争力的精度和召回率。我们描述并演示了一种针对特定实验目标调整 Compacta 参数的管道。

结论

Compacta 是一种快速灵活的算法,用于确定代表转录组的最佳 contig 集,以进行下游分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aa6/7014741/b8b549c99254/12864_2020_6528_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验