Suppr超能文献

UnigeneFinder:一种用于在没有参考基因组的情况下从转录组组装中进行基因识别的自动化流程。

UnigeneFinder: An Automated Pipeline for Gene Calling From Transcriptome Assemblies Without a Reference Genome.

作者信息

Xue Bo, Prado Karine, Rhee Seung Yon, Stata Matt

机构信息

Plant Resilience Institute Michigan State University East Lansing Michigan USA.

Department of Biochemistry and Molecular Biology Michigan State University East Lansing Michigan USA.

出版信息

Plant Direct. 2025 Apr 22;9(4):e70056. doi: 10.1002/pld3.70056. eCollection 2025 Apr.

Abstract

For most species, transcriptome data are much more readily available than genome data. Without a reference genome, gene calling is cumbersome and inaccurate because of the high degree of redundancy in de novo transcriptome assemblies. To simplify and increase the accuracy of de novo transcriptome assembly in the absence of a reference genome, we developed UnigeneFinder. Combining several clustering methods, UnigeneFinder substantially reduces the redundancy typical of raw transcriptome assemblies. This pipeline offers an effective solution to the problem of inflated transcript numbers, achieving a closer representation of the actual underlying genome. UnigeneFinder performs comparably or better, compared with existing tools, on plant species with varying genome complexities. UnigeneFinder is the only available transcriptome redundancy solution that fully automates the generation of primary transcript, coding region, and protein sequences, analogous to those available for high-quality reference genomes. These features, coupled with the pipeline's cross-platform implementation, focus on automation, and an accessible, user-friendly interface, make UnigeneFinder a useful tool for many downstream sequence-based analyses in nonmodel organisms lacking a reference genome, including differential gene expression analysis, accurate ortholog identification, functional enrichments, and evolutionary analyses. UnigeneFinder also runs efficiently both on high-performance computing (HPC) systems and personal computers, further reducing barriers to use.

摘要

对于大多数物种而言,转录组数据比基因组数据更容易获取。在没有参考基因组的情况下,由于从头转录组组装中的高度冗余性,基因识别既繁琐又不准确。为了在没有参考基因组的情况下简化并提高从头转录组组装的准确性,我们开发了UnigeneFinder。通过结合多种聚类方法,UnigeneFinder大幅减少了原始转录组组装中典型的冗余性。该流程为转录本数量膨胀的问题提供了有效的解决方案,更接近实际潜在基因组的表现形式。与现有工具相比,UnigeneFinder在具有不同基因组复杂性的植物物种上表现相当或更优。UnigeneFinder是唯一可用的转录组冗余解决方案,它能完全自动化生成初级转录本、编码区和蛋白质序列,类似于为高质量参考基因组提供的序列。这些特性,再加上该流程的跨平台实现、对自动化的关注以及易于访问且用户友好的界面,使得UnigeneFinder成为在缺乏参考基因组的非模式生物中进行许多基于序列的下游分析的有用工具,包括差异基因表达分析、准确的直系同源物鉴定、功能富集和进化分析。UnigeneFinder在高性能计算(HPC)系统和个人计算机上也能高效运行,进一步降低了使用门槛。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/393e/12012387/99a43bb6a286/PLD3-9-e70056-g002.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验