Suppr超能文献

通过迭代隐马尔可夫模型细化,对 MDR 超家族中的中链脱氢酶/还原酶进行细分。

Subdivision of the MDR superfamily of medium-chain dehydrogenases/reductases through iterative hidden Markov model refinement.

机构信息

IFM Bioinformatics, Linköping University, S-581 83 Linköping, Sweden.

出版信息

BMC Bioinformatics. 2010 Oct 27;11:534. doi: 10.1186/1471-2105-11-534.

Abstract

BACKGROUND

The Medium-chain Dehydrogenases/Reductases (MDR) form a protein superfamily whose size and complexity defeats traditional means of subclassification; it currently has over 15000 members in the databases, the pairwise sequence identity is typically around 25%, there are members from all kingdoms of life, the chain-lengths vary as does the oligomericity, and the members are partaking in a multitude of biological processes. There are profile hidden Markov models (HMMs) available for detecting MDR superfamily members, but none for determining which MDR family each protein belongs to. The current torrential influx of new sequence data enables elucidation of more and more protein families, and at an increasingly fine granularity. However, gathering good quality training data usually requires manual attention by experts and has therefore been the rate limiting step for expanding the number of available models.

RESULTS

We have developed an automated algorithm for HMM refinement that produces stable and reliable models for protein families. This algorithm uses relationships found in data to generate confident seed sets. Using this algorithm we have produced HMMs for 86 distinct MDR families and 34 of their subfamilies which can be used in automated annotation of new sequences. We find that MDR forms with 2 Zn2+ ions in general are dehydrogenases, while MDR forms with no Zn2+ in general are reductases. Furthermore, in Bacteria MDRs without Zn2+ are more frequent than those with Zn2+, while the opposite is true for eukaryotic MDRs, indicating that Zn2+ has been recruited into the MDR superfamily after the initial life kingdom separations. We have also developed a web site http://mdr-enzymes.org that provides textual and numeric search against various characterised MDR family properties, as well as sequence scan functions for reliable classification of novel MDR sequences.

CONCLUSIONS

Our method of refinement can be readily applied to create stable and reliable HMMs for both MDR and other protein families, and to confidently subdivide large and complex protein superfamilies. HMMs created using this algorithm correspond to evolutionary entities, making resolution of overlapping models straightforward. The implementation and support scripts for running the algorithm on computer clusters are available as open source software, and the database files underlying the web site are freely downloadable. The web site also makes our findings directly useful also for non-bioinformaticians.

摘要

背景

中链脱氢酶/还原酶(MDR)形成了一个蛋白质超家族,其大小和复杂性超过了传统的分类方法;目前数据库中已有超过 15000 个成员,它们之间的序列同一性通常约为 25%,来自所有生命领域的成员都有,链长各不相同,寡聚性也不同,成员参与多种生物过程。有用于检测 MDR 超家族成员的轮廓隐马尔可夫模型(HMM),但没有用于确定每个蛋白质所属的 MDR 家族的模型。当前新序列数据的大量涌入使得越来越多的蛋白质家族能够被阐明,而且粒度也越来越细。然而,收集高质量的训练数据通常需要专家的手动关注,因此一直是扩大可用模型数量的限速步骤。

结果

我们开发了一种用于 HMM 细化的自动化算法,该算法可生成用于蛋白质家族的稳定可靠的模型。该算法使用数据中发现的关系来生成有信心的种子集。使用该算法,我们为 86 个不同的 MDR 家族及其 34 个子家族生成了 HMM,可用于新序列的自动注释。我们发现,一般来说,具有 2 个 Zn2+离子的 MDR 形式通常是脱氢酶,而一般没有 Zn2+的 MDR 形式通常是还原酶。此外,在细菌中,没有 Zn2+的 MDR 比有 Zn2+的 MDR 更常见,而真核 MDR 则相反,这表明 Zn2+是在最初的生命王国分离之后被招募到 MDR 超家族中的。我们还开发了一个网站 http://mdr-enzymes.org,该网站提供了针对各种特征化的 MDR 家族属性的文本和数字搜索,以及用于可靠分类新的 MDR 序列的序列扫描功能。

结论

我们的细化方法可以很容易地应用于为 MDR 和其他蛋白质家族创建稳定可靠的 HMM,并自信地细分大型和复杂的蛋白质超家族。使用此算法创建的 HMM 对应于进化实体,使得重叠模型的分辨率变得简单。在计算机集群上运行算法的实现和支持脚本可作为开源软件获得,网站背后的数据库文件可免费下载。该网站还使我们的发现对非生物信息学家也直接有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee6/2976758/1597fefae9ad/1471-2105-11-534-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验