Grundy W N, Bailey T L, Elkan C P, Baker M E
Department of Computer Science and Engineering, San Diego Supercomputer Center, California, USA.
Biochem Biophys Res Commun. 1997 Feb 24;231(3):760-6. doi: 10.1006/bbrc.1997.6193.
The increasing size of protein sequence databases is straining methods of sequence analysis, even as the increased information offers opportunities for sophisticated analyses of protein structure, function, and evolution. Here we describe a method that uses artificial intelligence-based algorithms to build models of families of protein sequences. These models can be used to search protein sequence databases for remote homologs. The MEME (Multiple Expectation-maximization for Motif Elicitation) software package identifies motif patterns in a protein family, and these motifs are combined into a hidden Markvov model (HMM) for use as a database searching tool. Meta-MEME is sensitive and accurate, as well as automated and unbiased, making it suitable for the analysis of large datasets. We demonstrate Meta-MEME on a family of dehydrogenases that includes mammalian 11 beta-hydroxysteroid and 17 beta-hydroxysteroid dehydrogenase and their homologs in the short chain alcohol dehydrogenase family. We chose this dataset because it is large and phylogenetically diverse, providing a good test of the sensitivity and selectivity of Meta-MEME on a protein family of biological interest. Indeed, Meta-MEME identifies at least 350 members of this family in Genpept96 and clearly separates these sequences from non-homologous proteins. We also show how the MEME motif output can be used for phylogenetic analysis.
蛋白质序列数据库规模的不断扩大,正使序列分析方法不堪重负,即便增加的信息为蛋白质结构、功能及进化的深入分析提供了机会。在此,我们描述一种利用基于人工智能的算法构建蛋白质序列家族模型的方法。这些模型可用于在蛋白质序列数据库中搜索远源同源物。MEME(用于模体发现的多重期望最大化)软件包可识别蛋白质家族中的模体模式,这些模体被组合成一个隐马尔可夫模型(HMM)用作数据库搜索工具。Meta-MEME灵敏且准确,同时具有自动化和无偏性,适用于大型数据集的分析。我们在一个脱氢酶家族上展示了Meta-MEME,该家族包括哺乳动物11β-羟基类固醇脱氢酶和17β-羟基类固醇脱氢酶及其在短链醇脱氢酶家族中的同源物。我们选择这个数据集是因为它规模大且系统发育多样,能很好地检验Meta-MEME在一个具有生物学意义的蛋白质家族上的灵敏性和选择性。实际上,Meta-MEME在Genpept96中识别出了该家族至少350个成员,并将这些序列与非同源蛋白质清晰区分开来。我们还展示了MEME模体输出如何用于系统发育分析。