Suppr超能文献

CoExpPhylo——一种用于生物合成基因发现的新型流程。

CoExpPhylo - a novel pipeline for biosynthesis gene discovery.

作者信息

Grünig Nele, Pucker Boas

机构信息

Plant Biotechnology and Bioinformatics, Institute of Plant Biology & Braunschweig Integrated, Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany.

Institute for Cellular and Molecular Botany (IZMB), University of Bonn, Kirschallee 1, Bonn, 53115, Germany.

出版信息

BMC Genomics. 2025 Sep 22;26(1):807. doi: 10.1186/s12864-025-12061-3.

Abstract

BACKGROUND

The rapid advancement of sequencing technologies has drastically increased the availability of plant genomic and transcriptomic data, shifting the challenge from data generation to functional interpretation. Identifying genes involved in specialized metabolism remains difficult. While coexpression analysis is a widely used approach to identify genes acting in the same pathway or process, it has limitations, particularly in distinguishing genes coexpressed due to shared regulatory triggers from those directly involved in the same pathway. To enhance functional predictions, integrating phylogenetic analysis provides an additional layer of confidence by considering evolutionary conservation. Here, we introduce CoExpPhylo, a computational pipeline that systematically combines coexpression analysis and phylogenetics to identify candidate genes involved in specialized biosynthetic pathways across multiple species based on one to multiple bait gene candidates.

RESULTS

CoExpPhylo systematically integrates coexpression information and phylogenetic signals to identify candidate genes involved in specialized biosynthetic pathways. The pipeline consists of multiple computational steps: (1) species-specific coexpression analysis, (2) local sequence alignment to identify orthologs, (3) clustering of candidate genes into Orthologous Coexpressed Groups (OCGs), (4) functional annotation, (5) global sequence alignment, (6) phylogenetic tree generation, and optionally (7) visualization. The workflow is highly customizable, allowing users to adjust correlation thresholds, filtering parameters, and annotation sources. Benchmarking CoExpPhylo on multiple pathways, including the biosynthesis of anthocyanins, proanthocyanidins, and flavonols, as well as lutein and zeaxanthin, confirmed its ability to recover known genes while also suggesting novel candidates.

CONCLUSION

CoExpPhylo provides a systematic framework for identifying candidate genes involved in the specialized metabolism. By integrating coexpression data with phylogenetic clustering, it facilitates the discovery of both conserved and lineage-specific genes. The resulting OCGs offer a strong foundation for further experimental validation, bridging the gap between computational predictions and functional characterization. Future improvements, such as incorporating multi-species reference databases and refining clustering for large gene families, could further enhance its resolution. Overall, CoExpPhylo represents a valuable tool for accelerating pathway elucidation and advancing our understanding of specialized metabolism in plants.

摘要

背景

测序技术的迅速发展极大地增加了植物基因组和转录组数据的可得性,将挑战从数据生成转移到功能解读。鉴定参与特殊代谢的基因仍然具有难度。虽然共表达分析是一种广泛用于鉴定参与同一途径或过程的基因的方法,但它存在局限性,特别是在区分因共享调控触发因素而共表达的基因与直接参与同一途径的基因方面。为了增强功能预测,整合系统发育分析通过考虑进化保守性提供了额外的可信度。在此,我们介绍CoExpPhylo,这是一种计算流程,它系统地结合共表达分析和系统发育学,基于一到多个诱饵基因候选物来鉴定跨多个物种参与特殊生物合成途径的候选基因。

结果

CoExpPhylo系统地整合共表达信息和系统发育信号,以鉴定参与特殊生物合成途径的候选基因。该流程由多个计算步骤组成:(1)物种特异性共表达分析,(2)局部序列比对以鉴定直系同源物,(3)将候选基因聚类到直系同源共表达组(OCG)中,(4)功能注释,(5)全局序列比对,(6)系统发育树生成,以及可选的(7)可视化。该工作流程具有高度可定制性,允许用户调整相关性阈值、过滤参数和注释来源。在多个途径上对CoExpPhylo进行基准测试,包括花青素、原花青素和黄酮醇的生物合成,以及叶黄素和玉米黄质,证实了其能够找回已知基因,同时还能提出新的候选基因。

结论

CoExpPhylo为鉴定参与特殊代谢的候选基因提供了一个系统框架。通过将共表达数据与系统发育聚类相结合,它有助于发现保守基因和谱系特异性基因。所得的OCG为进一步的实验验证提供了坚实基础,弥合了计算预测与功能表征之间的差距。未来的改进,如纳入多物种参考数据库和完善对大型基因家族的聚类,可能会进一步提高其分辨率。总体而言,CoExpPhylo是加速途径阐明和推进我们对植物特殊代谢理解的有价值工具。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验