利用替换概率改进位点特异性评分矩阵。

Each column of amino acids in a multiple alignment of protein sequences can be represented as a vector of 20 amino acid counts. For alignment and searching applications, the count vector is an imperfect representation of a position, because the observed sequences are an incomplete sample of the full set of related sequences. One general solution to this problem is to model unobserved sequences by adding artificial 'pseudo-counts' to the observed counts. We introduce a simple method for computing pseudo-counts that combines the diversity observed in each alignment position with amino acid substitution probabilities. In extensive empirical tests, this position-based method out-performed other pseudo-count methods and was a substantial improvement over the traditional average score method used for constructing profiles.

蛋白质序列多重比对中的每一列氨基酸都可以表示为一个由20种氨基酸计数组成的向量。对于比对和搜索应用，计数向量是对一个位置的不完美表示，因为观察到的序列只是完整相关序列集的一个不完整样本。解决这个问题的一个通用方法是通过向观察到的计数中添加人工“伪计数”来对未观察到的序列进行建模。我们引入了一种计算伪计数的简单方法，该方法将每个比对位置观察到的多样性与氨基酸替换概率相结合。在广泛的实证测试中，这种基于位置的方法优于其他伪计数方法，并且相对于用于构建图谱的传统平均得分方法有了实质性的改进。

Using substitution probabilities to improve position-specific scoring matrices.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献