Xu Weijia, Miranker Daniel P
Department of Computer Sciences, The Center for Computational Biology and Bioinformatics, University of Texas, Austin, TX 78712, USA.
Bioinformatics. 2004 May 22;20(8):1214-21. doi: 10.1093/bioinformatics/bth065. Epub 2004 Feb 10.
We address the question of whether there exists an effective evolutionary model of amino-acid substitution that forms a metric-distance function. There is always a trade-off between speed and sensitivity among competing computational methods of determining sequence homology. A metric model of evolution is a prerequisite for the development of an entire class of fast sequence analysis algorithms that are both scalable, O(log n) and sensitive.
We have reworked the mathematics of the point accepted mutation model (PAM) by calculating the expected time between accepted mutations in lieu of calculating log-odds probabilities. The resulting substitution matrix (mPAM) forms a metric. We validate the application of the mPAM evolutionary model for sequence homology by executing sequence queries from a controlled yeast protein homology search benchmark. We compare the accuracy of the results of mPAM and PAM similarity matrices as well as three prior metric models. The experiment shows that mPAM significantly outperforms the other three metrics and sufficiently approaches the sensitivity of PAM250 to make it applicable to the management of protein sequence databases.
我们探讨是否存在一种有效的氨基酸替换进化模型,该模型能形成一个度量距离函数。在确定序列同源性的各种竞争计算方法中,速度和灵敏度之间始终存在权衡。进化的度量模型是开发一类快速序列分析算法的先决条件,这类算法既要具有可扩展性(O(log n))又要灵敏。
我们通过计算接受突变之间的预期时间,而不是计算对数优势概率,对接受点突变模型(PAM)的数学进行了重新推导。由此产生的替换矩阵(mPAM)形成了一个度量。我们通过执行来自受控酵母蛋白质同源性搜索基准的序列查询,验证了mPAM进化模型在序列同源性方面的应用。我们比较了mPAM和PAM相似性矩阵以及三个先前的度量模型结果的准确性。实验表明,mPAM显著优于其他三个度量,并且足够接近PAM250的灵敏度,使其适用于蛋白质序列数据库的管理。