Suppr超能文献

从头开始在真核生物中搜索新兴基因 DENSE。

De Novo Emerged Gene Search in Eukaryotes with DENSE.

机构信息

Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, 91198 Gif-sur-Yvette, France.

Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany.

出版信息

Genome Biol Evol. 2024 Aug 5;16(8). doi: 10.1093/gbe/evae159.

Abstract

The discovery of de novo emerged genes, originating from previously noncoding DNA regions, challenges traditional views of species evolution. Indeed, the hypothesis of neutrally evolving sequences giving rise to functional proteins is highly unlikely. This conundrum has sparked numerous studies to quantify and characterize these genes, aiming to understand their functional roles and contributions to genome evolution. Yet, no fully automated pipeline for their identification is available. Therefore, we introduce DENSE (DE Novo emerged gene SEarch), an automated Nextflow pipeline based on two distinct steps: detection of taxonomically restricted genes (TRGs) through phylostratigraphy, and filtering of TRGs for de novo emerged genes via genome comparisons and synteny search. DENSE is available as a user-friendly command-line tool, while the second step is accessible through a web server upon providing a list of TRGs. Highly flexible, DENSE provides various strategy and parameter combinations, enabling users to adapt to specific configurations or define their own strategy through a rational framework, facilitating protocol communication, and study interoperability. We apply DENSE to seven model organisms, exploring the impact of its strategies and parameters on de novo gene predictions. This thorough analysis across species with different evolutionary rates reveals useful metrics for users to define input datasets, identify favorable/unfavorable conditions for de novo gene detection, and control potential biases in genome annotations. Additionally, predictions made for the seven model organisms are compiled into a requestable database, which we hope will serve as a reference for de novo emerged gene lists generated with specific criteria combinations.

摘要

从头出现的基因的发现,起源于以前的非编码 DNA 区域,挑战了物种进化的传统观点。事实上,序列中性进化产生功能蛋白的假说极不可能成立。这一难题引发了众多研究,旨在量化和描述这些基因,以了解它们的功能作用和对基因组进化的贡献。然而,目前还没有完全自动化的鉴定这些基因的流程。因此,我们引入了 DENSE(从头出现的基因搜索),这是一个基于两个不同步骤的自动化 Nextflow 流程:通过系统发生学限制基因(TRGs)检测,以及通过基因组比较和同线性搜索过滤 TRGs 以获得从头出现的基因。DENSE 提供了一个用户友好的命令行工具,而第二步可以通过提供 TRGs 列表在 Web 服务器上访问。高度灵活的 DENSE 提供了各种策略和参数组合,使用户能够适应特定的配置或通过合理的框架定义自己的策略,从而促进协议的交流和研究的互操作性。我们将 DENSE 应用于七个模式生物,探索其策略和参数对从头基因预测的影响。这种对具有不同进化率的物种的全面分析揭示了有用的指标,供用户定义输入数据集,识别从头基因检测的有利/不利条件,并控制基因组注释中的潜在偏差。此外,为七个模式生物进行的预测被编译到一个可请求的数据库中,我们希望该数据库将成为根据特定标准组合生成的从头出现的基因列表的参考。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fac8/11363675/588404b7a363/evae159f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验