• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用短时傅里叶变换加速基于插入缺失进化的系统发育感知比对。

Accelerating phylogeny-aware alignment with indel evolution using short time Fourier transform.

作者信息

Maiolo Massimo, Ulzega Simone, Gil Manuel, Anisimova Maria

机构信息

Institute of Applied Simulation, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), CH-8820 Wädenswil, Switzerland.

出版信息

NAR Genom Bioinform. 2020 Nov 6;2(4):lqaa092. doi: 10.1093/nargab/lqaa092. eCollection 2020 Dec.

DOI:10.1093/nargab/lqaa092
PMID:33575636
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7671320/
Abstract

Recently we presented a frequentist dynamic programming (DP) approach for multiple sequence alignment based on the explicit model of indel evolution Poisson Indel Process (PIP). This phylogeny-aware approach produces evolutionary meaningful gap patterns and is robust to the 'over-alignment' bias. Despite linear time complexity for the computation of marginal likelihoods, the overall method's complexity is cubic in sequence length. Inspired by the popular aligner MAFFT, we propose a new technique to accelerate the evolutionary indel based alignment. Amino acid sequences are converted to sequences representing their physicochemical properties, and homologous blocks are identified by multi-scale short-time Fourier transform. Three three-dimensional DP matrices are then created under PIP, with homologous blocks defining sparse structures where most cells are excluded from the calculations. The homologous blocks are connected through intermediate 'linking blocks'. The homologous and linking blocks are aligned under PIP as independent DP sub-matrices and their tracebacks merged to yield the final alignment. The new algorithm can largely profit from parallel computing, yielding a theoretical speed-up estimated to be proportional to the cubic power of the number of sub-blocks in the DP matrices. We compare the new method to the original PIP approach and demonstrate it on real data.

摘要

最近,我们基于插入缺失进化的显式模型——泊松插入缺失过程(PIP),提出了一种用于多序列比对的频率主义动态规划(DP)方法。这种系统发育感知方法产生了具有进化意义的空位模式,并且对“过度比对”偏差具有鲁棒性。尽管计算边际似然的时间复杂度为线性,但该整体方法的复杂度在序列长度上是立方级的。受流行的比对工具MAFFT启发,我们提出了一种新技术来加速基于进化插入缺失的比对。氨基酸序列被转换为代表其物理化学性质的序列,并通过多尺度短时傅里叶变换识别同源块。然后在PIP下创建三个三维DP矩阵,同源块定义了稀疏结构,其中大多数单元格被排除在计算之外。同源块通过中间的“连接块”相连。同源块和连接块在PIP下作为独立的DP子矩阵进行比对,并将它们的回溯合并以产生最终比对。新算法可以在很大程度上受益于并行计算,理论上的加速比估计与DP矩阵中子块数量的立方成正比。我们将新方法与原始的PIP方法进行比较,并在实际数据上进行了演示。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f86c/7671320/48873cf318ce/lqaa092fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f86c/7671320/6432fa5c4caa/lqaa092fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f86c/7671320/bca4311d6b66/lqaa092fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f86c/7671320/42217dcddffa/lqaa092fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f86c/7671320/48873cf318ce/lqaa092fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f86c/7671320/6432fa5c4caa/lqaa092fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f86c/7671320/bca4311d6b66/lqaa092fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f86c/7671320/42217dcddffa/lqaa092fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f86c/7671320/48873cf318ce/lqaa092fig5.jpg

相似文献

1
Accelerating phylogeny-aware alignment with indel evolution using short time Fourier transform.利用短时傅里叶变换加速基于插入缺失进化的系统发育感知比对。
NAR Genom Bioinform. 2020 Nov 6;2(4):lqaa092. doi: 10.1093/nargab/lqaa092. eCollection 2020 Dec.
2
Progressive multiple sequence alignment with indel evolution.渐进式空位进化多序列比对。
BMC Bioinformatics. 2018 Sep 21;19(1):331. doi: 10.1186/s12859-018-2357-1.
3
ProPIP: a tool for progressive multiple sequence alignment with Poisson Indel Process.ProPIP:一种基于泊松分布插入缺失模型的渐进式多序列比对工具。
BMC Bioinformatics. 2021 Oct 24;22(1):518. doi: 10.1186/s12859-021-04442-8.
4
ARPIP: Ancestral Sequence Reconstruction with Insertions and Deletions under the Poisson Indel Process.ARPIP:泊松插入缺失过程下的插入和缺失的祖先序列重建。
Syst Biol. 2023 Jun 16;72(2):307-318. doi: 10.1093/sysbio/syac050.
5
A Modified Multiple Alignment Fast Fourier Transform with Higher Efficiency.一种具有更高效率的改进型多重比对快速傅里叶变换。
IEEE/ACM Trans Comput Biol Bioinform. 2017 May-Jun;14(3):634-645. doi: 10.1109/TCBB.2016.2530064. Epub 2016 Feb 15.
6
A Poissonian Model of Indel Rate Variation for Phylogenetic Tree Inference.用于系统发育树推断的插入缺失率变异的泊松模型。
Syst Biol. 2017 Sep 1;66(5):698-714. doi: 10.1093/sysbio/syx033.
7
General continuous-time Markov model of sequence evolution via insertions/deletions: are alignment probabilities factorable?通过插入/缺失进行序列进化的一般连续时间马尔可夫模型:比对概率是否可分解?
BMC Bioinformatics. 2016 Aug 11;17:304. doi: 10.1186/s12859-016-1105-7.
8
Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
9
Sequence alignments and pair hidden Markov models using evolutionary history.使用进化历史的序列比对和配对隐马尔可夫模型。
J Mol Biol. 2003 Oct 17;333(2):453-60. doi: 10.1016/j.jmb.2003.08.015.
10
Automatic discovery of sub-molecular sequence domains in multi-aligned sequences: a dynamic programming algorithm for multiple alignment segmentation.多序列比对中分子序列域的自动发现:一种用于多序列比对分割的动态规划算法
J Theor Biol. 2001 Sep 21;212(2):129-39. doi: 10.1006/jtbi.2001.2319.

引用本文的文献

1
Insertions and Deletions: Computational Methods, Evolutionary Dynamics, and Biological Applications.插入和缺失:计算方法、进化动态和生物应用。
Mol Biol Evol. 2024 Sep 4;41(9). doi: 10.1093/molbev/msae177.
2
Statistical framework to determine indel-length distribution.用于确定插入缺失长度分布的统计框架。
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae043.
3
ProPIP: a tool for progressive multiple sequence alignment with Poisson Indel Process.ProPIP:一种基于泊松分布插入缺失模型的渐进式多序列比对工具。

本文引用的文献

1
Progressive multiple sequence alignment with indel evolution.渐进式空位进化多序列比对。
BMC Bioinformatics. 2018 Sep 21;19(1):331. doi: 10.1186/s12859-018-2357-1.
2
MAFFT multiple sequence alignment software version 7: improvements in performance and usability.MAFFT 多序列比对软件版本 7:性能和易用性的改进。
Mol Biol Evol. 2013 Apr;30(4):772-80. doi: 10.1093/molbev/mst010. Epub 2013 Jan 16.
3
Evolutionary inference via the Poisson Indel Process.通过泊松插入缺失过程进行进化推断。
BMC Bioinformatics. 2021 Oct 24;22(1):518. doi: 10.1186/s12859-021-04442-8.
Proc Natl Acad Sci U S A. 2013 Jan 22;110(4):1160-6. doi: 10.1073/pnas.1220450110. Epub 2012 Dec 28.
4
Nature, position, and frequency of mutations made in a single cycle of HIV-1 replication.HIV-1 复制单轮中产生的突变的性质、位置和频率。
J Virol. 2010 Oct;84(19):9864-78. doi: 10.1128/JVI.00915-10. Epub 2010 Jul 21.
5
Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis.系统发育感知缺口放置可防止序列比对和进化分析中的错误。
Science. 2008 Jun 20;320(5883):1632-5. doi: 10.1126/science.1158395.
6
DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment.DIALIGN-T:一种改进的基于片段的多序列比对算法。
BMC Bioinformatics. 2005 Mar 22;6:66. doi: 10.1186/1471-2105-6-66.
7
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.MAFFT:一种基于快速傅里叶变换的快速多序列比对新方法。
Nucleic Acids Res. 2002 Jul 15;30(14):3059-66. doi: 10.1093/nar/gkf436.
8
BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations.BAliBASE(基准比对数据库):针对重复序列、跨膜序列和环形排列的增强功能。
Nucleic Acids Res. 2001 Jan 1;29(1):323-6. doi: 10.1093/nar/29.1.323.
9
An efficient method for matching nucleic acid sequences.一种匹配核酸序列的有效方法。
Nucleic Acids Res. 1982 Jan 11;10(1):133-9. doi: 10.1093/nar/10.1.133.
10
Amino acid difference formula to help explain protein evolution.有助于解释蛋白质进化的氨基酸差异公式。
Science. 1974 Sep 6;185(4154):862-4. doi: 10.1126/science.185.4154.862.