Suppr超能文献

序列比对分数如何对应概率模型。

How sequence alignment scores correspond to probability models.

机构信息

Artificial Intelligence Research Center, AIST, Tokyo 135-0064, Japan.

Graduate School of Frontier Sciences, University of Tokyo, Chiba 277-8568, Japan.

出版信息

Bioinformatics. 2020 Jan 15;36(2):408-415. doi: 10.1093/bioinformatics/btz576.

Abstract

MOTIVATION

Sequence alignment remains fundamental in bioinformatics. Pair-wise alignment is traditionally based on ad hoc scores for substitutions, insertions and deletions, but can also be based on probability models (pair hidden Markov models: PHMMs). PHMMs enable us to: fit the parameters to each kind of data, calculate the reliability of alignment parts and measure sequence similarity integrated over possible alignments.

RESULTS

This study shows how multiple models correspond to one set of scores. Scores can be converted to probabilities by partition functions with a 'temperature' parameter: for any temperature, this corresponds to some PHMM. There is a special class of models with balanced length probability, i.e. no bias toward either longer or shorter alignments. The best way to score alignments and assess their significance depends on the aim: judging whether whole sequences are related versus finding related parts. This clarifies the statistical basis of sequence alignment.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

序列比对在生物信息学中仍然是基础。传统的两两比对是基于替换、插入和缺失的特定分数,但也可以基于概率模型(对隐马尔可夫模型:PHMMs)。PHMMs 使我们能够:根据每种数据拟合参数,计算比对部分的可靠性,并测量在可能的比对中整合的序列相似性。

结果

本研究表明了多个模型如何对应于一组分数。分数可以通过具有“温度”参数的分区函数转换为概率:对于任何温度,这对应于某个 PHMM。存在一类具有平衡长度概率的特殊模型,即不对较长或较短的比对有偏见。评分比对并评估其重要性的最佳方法取决于目的:判断整个序列是否相关,还是寻找相关部分。这阐明了序列比对的统计基础。

补充信息

补充数据可在 Bioinformatics 在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验