Suppr超能文献

模拟结构域进化。

Simulating domain architecture evolution.

机构信息

Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

出版信息

Bioinformatics. 2022 Jun 24;38(Suppl 1):i134-i142. doi: 10.1093/bioinformatics/btac242.

Abstract

MOTIVATION

Simulation is an essential technique for generating biomolecular data with a 'known' history for use in validating phylogenetic inference and other evolutionary methods. On longer time scales, simulation supports investigations of equilibrium behavior and provides a formal framework for testing competing evolutionary hypotheses. Twenty years of molecular evolution research have produced a rich repertoire of simulation methods. However, current models do not capture the stringent constraints acting on the domain insertions, duplications, and deletions by which multidomain architectures evolve. Although these processes have the potential to generate any combination of domains, only a tiny fraction of possible domain combinations are observed in nature. Modeling these stringent constraints on domain order and co-occurrence is a fundamental challenge in domain architecture simulation that does not arise with sequence and gene family simulation.

RESULTS

Here, we introduce a stochastic model of domain architecture evolution to simulate evolutionary trajectories that reflect the constraints on domain order and co-occurrence observed in nature. This framework is implemented in a novel domain architecture simulator, DomArchov, using the Metropolis-Hastings algorithm with data-driven transition probabilities. The use of a data-driven event module enables quick and easy redeployment of the simulator for use in different taxonomic and protein function contexts. Using empirical evaluation with metazoan datasets, we demonstrate that domain architectures simulated by DomArchov recapitulate properties of genuine domain architectures that reflect the constraints on domain order and adjacency seen in nature. This work expands the realm of evolutionary processes that are amenable to simulation.

AVAILABILITY AND IMPLEMENTATION

DomArchov is written in Python 3 and is available at http://www.cs.cmu.edu/~durand/DomArchov. The data underlying this article are available via the same link.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

模拟是生成具有“已知”历史的生物分子数据的一种基本技术,可用于验证系统发育推断和其他进化方法。在较长的时间尺度上,模拟支持平衡行为的研究,并为测试竞争进化假设提供了正式框架。20 年来的分子进化研究产生了丰富的模拟方法。然而,目前的模型并不能捕捉到作用于域插入、复制和缺失的严格约束,这些过程是多域结构进化的基础。虽然这些过程有可能产生任何组合的域,但在自然界中只观察到了可能的域组合的一小部分。对域顺序和共现的这些严格约束进行建模是域结构模拟中的一个基本挑战,而在序列和基因家族模拟中不会出现这种挑战。

结果

在这里,我们引入了一种域结构进化的随机模型,以模拟反映自然界中观察到的域顺序和共现约束的进化轨迹。该框架在一个新的域结构模拟器 DomArchov 中实现,使用基于数据的转移概率的 Metropolis-Hastings 算法。使用数据驱动事件模块可以快速轻松地重新部署模拟器,以用于不同的分类和蛋白质功能上下文。使用后生动物数据集的经验评估,我们证明了 DomArchov 模拟的域结构再现了反映自然界中观察到的域顺序和邻接约束的真实域结构的特性。这项工作扩展了可模拟的进化过程的范围。

可用性和实现

DomArchov 是用 Python 3 编写的,可以在 http://www.cs.cmu.edu/~durand/DomArchov 找到。本文所依据的数据也可以通过同一链接获得。

补充信息

补充材料可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d86/9236583/1fcd58fdc5e7/btac242f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验