Brown M, Hughey R, Krogh A, Mian I S, Sjölander K, Haussler D
University of California, Santa Cruz 95064, USA.
Proc Int Conf Intell Syst Mol Biol. 1993;1:47-55.
A Bayesian method for estimating the amino acid distributions in the states of a hidden Markov model (HMM) for a protein family or the columns of a multiple alignment of that family is introduced. This method uses Dirichlet mixture densities as priors over amino acid distributions. These mixture densities are determined from examination of previously constructed HMMs or multiple alignments. It is shown that this Bayesian method can improve the quality of HMMs produced from small training sets. Specific experiments on the EF-hand motif are reported, for which these priors are shown to produce HMMs with higher likelihood on unseen data, and fewer false positives and false negatives in a database search task.
介绍了一种贝叶斯方法,用于估计蛋白质家族的隐马尔可夫模型(HMM)状态或该家族多序列比对列中的氨基酸分布。该方法使用狄利克雷混合密度作为氨基酸分布的先验。这些混合密度是通过检查先前构建的HMM或多序列比对来确定的。结果表明,这种贝叶斯方法可以提高从小训练集产生的HMM的质量。报告了对EF-手基序的具体实验,结果表明这些先验在未见数据上产生具有更高似然性的HMM,并且在数据库搜索任务中具有更少的假阳性和假阴性。