Suppr超能文献

用于检测远缘序列关系的蛋白质家族单模型和多模型研究

On single and multiple models of protein families for the detection of remote sequence relationships.

作者信息

Casbon James A, Saqi Mansoor A S

机构信息

Bioinformatics Group, Institute of Cell and Molecular Science, The Genome Centre, Queen Mary's School of Medicine and Dentistry, Charterhouse Square, London, EC1M 6BQ, UK.

出版信息

BMC Bioinformatics. 2006 Jan 31;7:48. doi: 10.1186/1471-2105-7-48.

Abstract

BACKGROUND

The detection of relationships between a protein sequence of unknown function and a sequence whose function has been characterised enables the transfer of functional annotation. However in many cases these relationships can not be identified easily from direct comparison of the two sequences. Methods which compare sequence profiles have been shown to improve the detection of these remote sequence relationships. However, the best method for building a profile of a known set of sequences has not been established. Here we examine how the type of profile built affects its performance, both in detecting remote homologs and in the resulting alignment accuracy. In particular, we consider whether it is better to model a protein superfamily using a single structure-based alignment that is representative of all known cases of the superfamily, or to use multiple sequence-based profiles each representing an individual member of the superfamily.

RESULTS

Using profile-profile methods for remote homolog detection we benchmark the performance of single structure-based superfamily models and multiple domain models. On average, over all superfamilies, using a truncated receiver operator characteristic (ROC5) we find that multiple domain models outperform single superfamily models, except at low error rates where the two models behave in a similar way. However there is a wide range of performance depending on the superfamily. For 12% of all superfamilies the ROC5 value for superfamily models is greater than 0.2 above the domain models and for 10% of superfamilies the domain models show a similar improvement in performance over the superfamily models.

CONCLUSION

Using a sensitive profile-profile method we have investigated the performance of single structure-based models and multiple sequence models (domain models) in detecting remote superfamily members. We find that overall, multiple models perform better in recognition although single structure-based models display better alignment accuracy.

摘要

背景

检测未知功能的蛋白质序列与已表征功能的序列之间的关系,能够实现功能注释的转移。然而,在许多情况下,无法通过直接比较这两个序列轻易识别出这些关系。已证明比较序列概况的方法可提高对这些远缘序列关系的检测能力。然而,构建已知序列集概况的最佳方法尚未确立。在此,我们研究构建的概况类型如何影响其在检测远缘同源物以及最终比对准确性方面的性能。特别是,我们考虑使用代表蛋白质超家族所有已知情况的单个基于结构的比对来构建蛋白质超家族模型,还是使用多个基于序列的概况,每个概况代表超家族的一个单独成员,哪种方法更好。

结果

使用概况 - 概况方法进行远缘同源物检测,我们对单个基于结构的超家族模型和多个结构域模型的性能进行了基准测试。总体而言,在所有超家族中,使用截断的接收者操作特征(ROC5),我们发现多个结构域模型的表现优于单个超家族模型,但在低错误率时,这两种模型的表现相似。然而,根据超家族的不同,性能存在很大差异。在所有超家族中,12%的超家族模型的ROC5值比结构域模型高0.2以上,10%的超家族中,结构域模型在性能上比超家族模型有类似的提升。

结论

使用灵敏的概况 - 概况方法,我们研究了单个基于结构的模型和多个序列模型(结构域模型)在检测远缘超家族成员方面的性能。我们发现,总体而言,多个模型在识别方面表现更好,尽管单个基于结构的模型显示出更好的比对准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd10/1397874/ee72c04b0b2d/1471-2105-7-48-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验