• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

统计比对:计算属性、同源性检测与拟合优度

Statistical alignment: computational properties, homology testing and goodness-of-fit.

作者信息

Hein J, Wiuf C, Knudsen B, Møller M B, Wibling G

机构信息

Department of Genetics and Ecology The Institute of Biological Science, University of Aarhus, Building 540, Ny Munkegade, Arhus C, 8000, Denmark.

出版信息

J Mol Biol. 2000 Sep 8;302(1):265-79. doi: 10.1006/jmbi.2000.4061.

DOI:10.1006/jmbi.2000.4061
PMID:10964574
Abstract

The model of insertions and deletions in biological sequences, first formulated by Thorne, Kishino, and Felsenstein in 1991 (the TKF91 model), provides a basis for performing alignment within a statistical framework. Here we investigate this model.Firstly, we show how to accelerate the statistical alignment algorithms several orders of magnitude. The main innovations are to confine likelihood calculations to a band close to the similarity based alignment, to get good initial guesses of the evolutionary parameters and to apply an efficient numerical optimisation algorithm for finding the maximum likelihood estimate. In addition, the recursions originally presented by Thorne, Kishino and Felsenstein can be simplified. Two proteins, about 1500 amino acids long, can be analysed with this method in less than five seconds on a fast desktop computer, which makes this method practical for actual data analysis.Secondly, we propose a new homology test based on this model, where homology means that an ancestor to a sequence pair can be found finitely far back in time. This test has statistical advantages relative to the traditional shuffle test for proteins.Finally, we describe a goodness-of-fit test, that allows testing the proposed insertion-deletion (indel) process inherent to this model and find that real sequences (here globins) probably experience indels longer than one, contrary to what is assumed by the model.

摘要

生物序列中插入和缺失的模型最早由索恩、岸野和费尔斯滕森于1991年提出(TKF91模型),为在统计框架内进行比对提供了基础。在此我们对该模型进行研究。首先,我们展示了如何将统计比对算法加速几个数量级。主要创新点在于将似然计算限制在接近基于相似性的比对的一个条带内,获得进化参数的良好初始猜测,并应用一种高效的数值优化算法来找到最大似然估计。此外,索恩、岸野和费尔斯滕森最初提出的递归可以简化。在一台快速的台式计算机上,用这种方法可以在不到五秒的时间内分析两条长度约为1500个氨基酸的蛋白质,这使得该方法对于实际数据分析具有实用性。其次,我们基于此模型提出了一种新的同源性测试,其中同源性意味着可以在有限的时间回溯中找到序列对的一个祖先。相对于传统的蛋白质重排测试,该测试具有统计优势。最后,我们描述了一种拟合优度测试,它允许对该模型固有的插入 - 缺失(indel)过程进行测试,并发现真实序列(这里是珠蛋白)可能经历长度大于一个的indel,这与该模型的假设相反。

相似文献

1
Statistical alignment: computational properties, homology testing and goodness-of-fit.统计比对:计算属性、同源性检测与拟合优度
J Mol Biol. 2000 Sep 8;302(1):265-79. doi: 10.1006/jmbi.2000.4061.
2
Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
3
An improved algorithm for statistical alignment of sequences related by a star tree.一种用于通过星型树相关的序列进行统计比对的改进算法。
Bull Math Biol. 2002 Jul;64(4):771-9. doi: 10.1006/bulm.2002.0300.
4
Fast model-based protein homology detection without alignment.基于快速模型的无需比对的蛋白质同源性检测。
Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.
5
Scoredist: a simple and robust protein sequence distance estimator.Scoredist:一种简单且强大的蛋白质序列距离估计器。
BMC Bioinformatics. 2005 Apr 27;6:108. doi: 10.1186/1471-2105-6-108.
6
T-Coffee: A novel method for fast and accurate multiple sequence alignment.T-Coffee:一种用于快速准确的多序列比对的新方法。
J Mol Biol. 2000 Sep 8;302(1):205-17. doi: 10.1006/jmbi.2000.4042.
7
A generalized affine gap model significantly improves protein sequence alignment accuracy.广义仿射间隙模型显著提高了蛋白质序列比对的准确性。
Proteins. 2005 Feb 1;58(2):329-38. doi: 10.1002/prot.20299.
8
Vestige: maximum likelihood phylogenetic footprinting.痕迹:最大似然系统发育足迹法。
BMC Bioinformatics. 2005 May 29;6:130. doi: 10.1186/1471-2105-6-130.
9
A simple genetic algorithm for multiple sequence alignment.一种用于多序列比对的简单遗传算法。
Genet Mol Res. 2007 Oct 5;6(4):964-82.
10
FAST: a novel protein structure alignment algorithm.FAST:一种新型蛋白质结构比对算法。
Proteins. 2005 Feb 15;58(3):618-27. doi: 10.1002/prot.20331.

引用本文的文献

1
Insertions and Deletions: Computational Methods, Evolutionary Dynamics, and Biological Applications.插入和缺失:计算方法、进化动态和生物应用。
Mol Biol Evol. 2024 Sep 4;41(9). doi: 10.1093/molbev/msae177.
2
Measuring Phylogenetic Information of Incomplete Sequence Data.测量不完全序列数据的系统发育信息。
Syst Biol. 2022 Apr 19;71(3):630-648. doi: 10.1093/sysbio/syab073.
3
The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary Alignment.累积插入缺失模型:快速准确的统计进化比对。
Syst Biol. 2021 Feb 10;70(2):236-257. doi: 10.1093/sysbio/syaa050.
4
Solving the master equation for Indels.求解插入缺失的主方程。
BMC Bioinformatics. 2017 May 12;18(1):255. doi: 10.1186/s12859-017-1665-1.
5
Inferring Indel Parameters using a Simulation-based Approach.使用基于模拟的方法推断插入缺失参数。
Genome Biol Evol. 2015 Nov 3;7(12):3226-38. doi: 10.1093/gbe/evv212.
6
Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs.使用有向无环图对多序列比对中的不确定性进行有效表示。
BMC Bioinformatics. 2015 Apr 1;16:108. doi: 10.1186/s12859-015-0516-1.
7
Quantifying variances in comparative RNA secondary structure prediction.量化比较 RNA 二级结构预测中的差异。
BMC Bioinformatics. 2013 May 1;14:149. doi: 10.1186/1471-2105-14-149.
8
A stochastic evolutionary model for protein structure alignment and phylogeny.一种用于蛋白质结构比对和系统发生的随机进化模型。
Mol Biol Evol. 2012 Nov;29(11):3575-87. doi: 10.1093/molbev/mss167. Epub 2012 Jun 21.
9
PSAR: measuring multiple sequence alignment reliability by probabilistic sampling.PSAR:通过概率抽样测量多重序列比对可靠性。
Nucleic Acids Res. 2011 Aug;39(15):6359-68. doi: 10.1093/nar/gkr334. Epub 2011 May 16.
10
Reticular alignment: a progressive corner-cutting method for multiple sequence alignment.网状排列:一种渐进式的多重序列比对角切割方法。
BMC Bioinformatics. 2010 Nov 23;11:570. doi: 10.1186/1471-2105-11-570.