Compacta：一种用于从头组装转录组的快速重叠群聚类工具。

Compacta: a fast contig clustering tool for de novo assembled transcriptomes.

机构信息

Departamento de Investigaciones Científicas y Tecnológicas de la Universidad de Sonora, Universidad de Sonora, Hermosillo, Mexico.

Unidad de Genómica Avanzada (Langebio), Centro de Investigacíon y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav), Irapuato, Gto, Mexico.

出版信息

BMC Genomics. 2020 Feb 11;21(1):148. doi: 10.1186/s12864-020-6528-x.

DOI:10.1186/s12864-020-6528-x

PMID:32046653

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7014741/

Abstract

BACKGROUND

RNA-Seq is the preferred method to explore transcriptomes and to estimate differential gene expression. When an organism has a well-characterized and annotated genome, reads obtained from RNA-Seq experiments can be directly mapped to that genome to estimate the number of transcripts present and relative expression levels of these transcripts. However, for unknown genomes, de novo assembly of RNA-Seq reads must be performed to generate a set of contigs that represents the transcriptome. These contig sets contain multiple transcripts, including immature mRNAs, spliced transcripts and allele variants, as well as products of close paralogs or gene families that can be difficult to distinguish. Thus, tools are needed to select a set of less redundant contigs to represent the transcriptome for downstream analyses. Here we describe the development of Compacta to produce contig sets from de novo assemblies.

RESULTS

Compacta is a fast and flexible computational tool that allows selection of a representative set of contigs from de novo assemblies. Using a graph-based algorithm, Compacta groups contigs into clusters based on the proportion of shared reads. The user can determine the minimum coverage of the contigs to be clustered, as well as a threshold for the proportion of shared reads in the clustered contigs, thus providing a dynamic range of transcriptome compression that can be adapted according to experimental aims. We compared the performance of Compacta against state of the art clustering algorithms on assemblies from Arabidopsis, mouse and mango, and found that Compacta yielded more rapid results and had competitive precision and recall ratios. We describe and demonstrate a pipeline to tailor Compacta parameters to specific experimental aims.

CONCLUSIONS

Compacta is a fast and flexible algorithm for the determination of optimum contig sets that represent the transcriptome for downstream analyses.

摘要

背景

RNA-Seq 是探索转录组和估计差异基因表达的首选方法。当一个生物体具有特征明确且注释良好的基因组时，可以将从 RNA-Seq 实验中获得的读取直接映射到该基因组，以估计存在的转录本数量和这些转录本的相对表达水平。然而，对于未知的基因组，必须执行 RNA-Seq 读取的从头组装，以生成一组代表转录组的 contigs。这些 contig 集包含多个转录本，包括不成熟的 mRNA、拼接的转录本和等位基因变体，以及紧密同源或基因家族的产物，这些产物很难区分。因此，需要工具来选择一组较少冗余的 contigs 来代表转录组进行下游分析。在这里，我们描述了 Compacta 的开发，以从从头组装中生成 contig 集。

结果

Compacta 是一种快速灵活的计算工具，允许从从头组装中选择一组代表性的 contigs。使用基于图的算法，Compacta 根据共享读取的比例将 contigs 分组到簇中。用户可以确定要聚类的 contigs 的最小覆盖度，以及在聚类 contigs 中共享读取的比例的阈值，从而提供了一个可以根据实验目标进行调整的转录组压缩的动态范围。我们将 Compacta 的性能与来自拟南芥、小鼠和芒果的组装的最先进的聚类算法进行了比较，发现 Compacta 产生了更快速的结果，并且具有有竞争力的精度和召回率。我们描述并演示了一种针对特定实验目标调整 Compacta 参数的管道。

结论

Compacta 是一种快速灵活的算法，用于确定代表转录组的最佳 contig 集，以进行下游分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aa6/7014741/b8b549c99254/12864_2020_6528_Fig1_HTML.jpg

相似文献

Compacta: a fast contig clustering tool for de novo assembled transcriptomes.

BMC Genomics. 2020 Feb 11;21(1):148. doi: 10.1186/s12864-020-6528-x.

Grouper: graph-based clustering and annotation for improved de novo transcriptome analysis.

Bioinformatics. 2018 Oct 1;34(19):3265-3272. doi: 10.1093/bioinformatics/bty378.

Removal of redundant contigs from de novo RNA-Seq assemblies via homology search improves accurate detection of differentially expressed genes.

BMC Genomics. 2015 Dec 4;16:1031. doi: 10.1186/s12864-015-2247-0.

ClusTrast: a short read de novo transcript isoform assembler guided by clustered contigs.

BMC Bioinformatics. 2024 Feb 1;25(1):54. doi: 10.1186/s12859-024-05663-3.

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab022.

FastEtch: A Fast Sketch-Based Assembler for Genomes.

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1091-1106. doi: 10.1109/TCBB.2017.2737999. Epub 2017 Sep 11.

De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers.

Gigascience. 2019 May 1;8(5). doi: 10.1093/gigascience/giz039.

DTA-SiST: de novo transcriptome assembly by using simplified suffix trees.

BMC Bioinformatics. 2019 Dec 24;20(Suppl 25):698. doi: 10.1186/s12859-019-3272-9.

Evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach.

PLoS One. 2012;7(2):e31410. doi: 10.1371/journal.pone.0031410. Epub 2012 Feb 27.

A comparison of next generation sequencing technologies for transcriptome assembly and utility for RNA-Seq in a non-model bird.

PLoS One. 2014 Oct 3;9(10):e108550. doi: 10.1371/journal.pone.0108550. eCollection 2014.

引用本文的文献

UnigeneFinder: An Automated Pipeline for Gene Calling From Transcriptome Assemblies Without a Reference Genome.

Plant Direct. 2025 Apr 22;9(4):e70056. doi: 10.1002/pld3.70056. eCollection 2025 Apr.

Trans2express - de novo transcriptome assembly pipeline optimized for gene expression analysis.

Plant Methods. 2024 Aug 17;20(1):128. doi: 10.1186/s13007-024-01255-7.

De novo transcriptome profiling reveals the patterns of gene expression in plum fruits with bud mutations.

Physiol Mol Biol Plants. 2024 Jun;30(6):909-919. doi: 10.1007/s12298-024-01472-3. Epub 2024 Jun 28.

De novo assembly of transcriptomes and differential gene expression analysis using short-read data from emerging model organisms - a brief guide.

Front Zool. 2024 Jun 20;21(1):17. doi: 10.1186/s12983-024-00538-y.

Transcriptional analysis of young sporophyte reveals conservation of stem cell factors in the root apical meristem.

Front Plant Sci. 2022 Aug 11;13:924660. doi: 10.3389/fpls.2022.924660. eCollection 2022.

A simple guide to de novo transcriptome assembly and annotation.

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab563.

本文引用的文献

Characterizing the Major Structural Variant Alleles of the Human Genome.

Cell. 2019 Jan 24;176(3):663-675.e19. doi: 10.1016/j.cell.2018.12.019. Epub 2019 Jan 17.

PdumBase: a transcriptome database and research tool for Platynereis dumerilii and early development of other metazoans.

BMC Genomics. 2018 Aug 16;19(1):618. doi: 10.1186/s12864-018-4987-0.

Grouper: graph-based clustering and annotation for improved de novo transcriptome analysis.

Bioinformatics. 2018 Oct 1;34(19):3265-3272. doi: 10.1093/bioinformatics/bty378.

Transcriptome Analysis of Mango (Mangifera indica L.) Fruit Epidermal Peel to Identify Putative Cuticle-Associated Genes.

Sci Rep. 2017 Apr 20;7:46163. doi: 10.1038/srep46163.

AtPRMT5 Regulates Shoot Regeneration through Mediating Histone H4R3 Dimethylation on KRPs and Pre-mRNA Splicing of RKP in Arabidopsis.

Mol Plant. 2016 Dec 5;9(12):1634-1646. doi: 10.1016/j.molp.2016.10.010. Epub 2016 Oct 22.

The paralog-to-contig assignment problem: high quality gene models from fragmented assemblies.

Algorithms Mol Biol. 2016 Feb 24;11:1. doi: 10.1186/s13015-016-0063-y. eCollection 2016.

Genome and transcriptome analysis of the Mesoamerican common bean and the role of gene duplications in establishing tissue and temporal specialization of genes.

Genome Biol. 2016 Feb 25;17:32. doi: 10.1186/s13059-016-0883-6.

BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Bioinformatics. 2015 Oct 1;31(19):3210-2. doi: 10.1093/bioinformatics/btv351. Epub 2015 Jun 9.

HISAT: a fast spliced aligner with low memory requirements.

Nat Methods. 2015 Apr;12(4):357-60. doi: 10.1038/nmeth.3317. Epub 2015 Mar 9.

Transposable elements and genome size variations in plants.

Genomics Inform. 2014 Sep;12(3):87-97. doi: 10.5808/GI.2014.12.3.87. Epub 2014 Sep 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Compacta：一种用于从头组装转录组的快速重叠群聚类工具。

Compacta: a fast contig clustering tool for de novo assembled transcriptomes.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献