• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

EvoLSTM:使用序列到序列 LSTM 的序列进化的上下文相关模型。

EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM.

机构信息

School of Computer Science, McGill University, Montreal, Quebec H3A 0G4, Canada.

出版信息

Bioinformatics. 2020 Jul 1;36(Suppl_1):i353-i361. doi: 10.1093/bioinformatics/btaa447.

DOI:10.1093/bioinformatics/btaa447
PMID:32657367
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7355264/
Abstract

MOTIVATION

Accurate probabilistic models of sequence evolution are essential for a wide variety of bioinformatics tasks, including sequence alignment and phylogenetic inference. The ability to realistically simulate sequence evolution is also at the core of many benchmarking strategies. Yet, mutational processes have complex context dependencies that remain poorly modeled and understood.

RESULTS

We introduce EvoLSTM, a recurrent neural network-based evolution simulator that captures mutational context dependencies. EvoLSTM uses a sequence-to-sequence long short-term memory model trained to predict mutation probabilities at each position of a given sequence, taking into consideration the 14 flanking nucleotides. EvoLSTM can realistically simulate mammalian and plant DNA sequence evolution and reveals unexpectedly strong long-range context dependencies in mutation probabilities. EvoLSTM brings modern machine-learning approaches to bear on sequence evolution. It will serve as a useful tool to study and simulate complex mutational processes.

AVAILABILITY AND IMPLEMENTATION

Code and dataset are available at https://github.com/DongjoonLim/EvoLSTM.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

准确的序列进化概率模型对于各种生物信息学任务至关重要,包括序列比对和系统发育推断。真实模拟序列进化的能力也是许多基准测试策略的核心。然而,突变过程具有复杂的上下文依赖关系,这些关系仍然建模和理解得很差。

结果

我们引入了 EvoLSTM,这是一种基于递归神经网络的进化模拟器,它可以捕获突变的上下文依赖关系。EvoLSTM 使用序列到序列的长短时记忆模型进行训练,以预测给定序列中每个位置的突变概率,同时考虑到 14 个侧翼核苷酸。EvoLSTM 可以真实地模拟哺乳动物和植物 DNA 序列进化,并揭示出突变概率中出人意料的强远程上下文依赖关系。EvoLSTM 将现代机器学习方法应用于序列进化。它将成为研究和模拟复杂突变过程的有用工具。

可用性和实现

代码和数据集可在 https://github.com/DongjoonLim/EvoLSTM 上获得。

补充信息

补充数据可在Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb83/7355264/d40883318568/btaa447f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb83/7355264/d4ca97ffe8d7/btaa447f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb83/7355264/0eb110fd51b6/btaa447f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb83/7355264/cfe9ccb01482/btaa447f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb83/7355264/db1241bf03c5/btaa447f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb83/7355264/c106607638bc/btaa447f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb83/7355264/61dddbc87b6c/btaa447f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb83/7355264/d40883318568/btaa447f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb83/7355264/d4ca97ffe8d7/btaa447f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb83/7355264/0eb110fd51b6/btaa447f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb83/7355264/cfe9ccb01482/btaa447f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb83/7355264/db1241bf03c5/btaa447f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb83/7355264/c106607638bc/btaa447f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb83/7355264/61dddbc87b6c/btaa447f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb83/7355264/d40883318568/btaa447f7.jpg

相似文献

1
EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM.EvoLSTM:使用序列到序列 LSTM 的序列进化的上下文相关模型。
Bioinformatics. 2020 Jul 1;36(Suppl_1):i353-i361. doi: 10.1093/bioinformatics/btaa447.
2
Prediction of mutation effects using a deep temporal convolutional network.使用深度时间卷积网络预测突变效应。
Bioinformatics. 2020 Apr 1;36(7):2047-2052. doi: 10.1093/bioinformatics/btz873.
3
Simulations of Sequence Evolution: How (Un)realistic They Are and Why.序列进化模拟:它们有多(不)真实以及为什么。
Mol Biol Evol. 2024 Jan 3;41(1). doi: 10.1093/molbev/msad277.
4
A machine-learning-based alternative to phylogenetic bootstrap.基于机器学习的替代系统,用于替代系统发育 bootstrap 分析。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i208-i217. doi: 10.1093/bioinformatics/btae255.
5
An introduction to deep learning on biological sequence data: examples and solutions.深度学习在生物序列数据上的应用:实例与解决方案。
Bioinformatics. 2017 Nov 15;33(22):3685-3690. doi: 10.1093/bioinformatics/btx531.
6
Prediction of mRNA subcellular localization using deep recurrent neural networks.基于深度递归神经网络的 mRNA 亚细胞定位预测。
Bioinformatics. 2019 Jul 15;35(14):i333-i342. doi: 10.1093/bioinformatics/btz337.
7
MSNet-4mC: learning effective multi-scale representations for identifying DNA N4-methylcytosine sites.MSNet-4mC:学习用于识别DNA N4-甲基胞嘧啶位点的有效多尺度表示。
Bioinformatics. 2022 Nov 30;38(23):5160-5167. doi: 10.1093/bioinformatics/btac671.
8
End-to-end learning of evolutionary models to find coding regions in genome alignments.用于在基因组比对中寻找编码区域的进化模型的端到端学习。
Bioinformatics. 2022 Mar 28;38(7):1857-1862. doi: 10.1093/bioinformatics/btac028.
9
A learning-based framework for miRNA-disease association identification using neural networks.基于神经网络的 miRNA-疾病关联识别学习框架。
Bioinformatics. 2019 Nov 1;35(21):4364-4371. doi: 10.1093/bioinformatics/btz254.
10
A divide-and-conquer method for scalable phylogenetic network inference from multilocus data.一种用于从多基因座数据推断可扩展系统发育网络的分而治之方法。
Bioinformatics. 2019 Jul 15;35(14):i370-i378. doi: 10.1093/bioinformatics/btz359.

引用本文的文献

1
Insertions and Deletions: Computational Methods, Evolutionary Dynamics, and Biological Applications.插入和缺失:计算方法、进化动态和生物应用。
Mol Biol Evol. 2024 Sep 4;41(9). doi: 10.1093/molbev/msae177.
2
Context-Dependent Substitution Dynamics in Plastid DNA Across a Wide Range of Taxonomic Groups.跨广泛分类群的质体 DNA 中依赖于上下文的替代动态。
J Mol Evol. 2022 Feb;90(1):44-55. doi: 10.1007/s00239-021-10040-2. Epub 2022 Jan 17.
3
Attention-Based Deep Multiple-Instance Learning for Classifying Circular RNA and Other Long Non-Coding RNA.

本文引用的文献

1
A Bayesian Framework for Inferring the Influence of Sequence Context on Point Mutations.贝叶斯框架推断序列上下文对点突变影响。
Mol Biol Evol. 2020 Mar 1;37(3):893-903. doi: 10.1093/molbev/msz248.
2
Signals of Variation in Human Mutation Rate at Multiple Levels of Sequence Context.人类突变率在多种序列背景下的变化信号。
Mol Biol Evol. 2019 May 1;36(5):955-965. doi: 10.1093/molbev/msz023.
3
Dynamics and function of DNA methylation in plants.植物中 DNA 甲基化的动态与功能。
基于注意力的深度多重实例学习在环状 RNA 和其他长链非编码 RNA 分类中的应用。
Genes (Basel). 2021 Dec 19;12(12):2018. doi: 10.3390/genes12122018.
Nat Rev Mol Cell Biol. 2018 Aug;19(8):489-506. doi: 10.1038/s41580-018-0016-z.
4
Statistical Methods for Identifying Sequence Motifs Affecting Point Mutations.用于识别影响点突变的序列基序的统计方法。
Genetics. 2017 Feb;205(2):843-856. doi: 10.1534/genetics.116.195677. Epub 2016 Dec 14.
5
LSTM: A Search Space Odyssey.长短期记忆网络:搜索空间奥德赛。
IEEE Trans Neural Netw Learn Syst. 2017 Oct;28(10):2222-2232. doi: 10.1109/TNNLS.2016.2582924. Epub 2016 Jul 8.
6
An expanded sequence context model broadly explains variability in polymorphism levels across the human genome.一个扩展的序列上下文模型广泛解释了人类基因组中多态性水平的变异性。
Nat Genet. 2016 Apr;48(4):349-55. doi: 10.1038/ng.3511. Epub 2016 Feb 15.
7
Trends in substitution models of molecular evolution.分子进化替代模型的趋势。
Front Genet. 2015 Oct 26;6:319. doi: 10.3389/fgene.2015.00319. eCollection 2015.
8
The effects of chromatin organization on variation in mutation rates in the genome.染色质组织对基因组突变率变异的影响。
Nat Rev Genet. 2015 Apr;16(4):213-23. doi: 10.1038/nrg3890. Epub 2015 Mar 3.
9
Alignathon: a competitive assessment of whole-genome alignment methods.比对马拉松:全基因组比对方法的竞争性评估
Genome Res. 2014 Dec;24(12):2077-89. doi: 10.1101/gr.174920.114. Epub 2014 Oct 1.
10
Mechanisms underlying mutational signatures in human cancers.人类癌症中突变特征的潜在机制。
Nat Rev Genet. 2014 Sep;15(9):585-98. doi: 10.1038/nrg3729. Epub 2014 Jul 1.