Suppr超能文献

考虑搜索数据库中不相关蛋白质之间的分数可以提高轮廓比较。

Considering scores between unrelated proteins in the search database improves profile comparison.

机构信息

Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390-9050, USA.

出版信息

BMC Bioinformatics. 2009 Dec 4;10:399. doi: 10.1186/1471-2105-10-399.

Abstract

BACKGROUND

Profile-based comparison of multiple sequence alignments is a powerful methodology for the detection remote protein sequence similarity, which is essential for the inference and analysis of protein structure, function, and evolution. Accurate estimation of statistical significance of detected profile similarities is essential for further development of this methodology. Here we analyze a novel approach to estimate the statistical significance of profile similarity: the explicit consideration of background score distributions for each database template (subject).

RESULTS

Using a simple scheme to combine and analytically approximate query- and subject-based distributions, we show that (i) inclusion of background distributions for the subjects increases the quality of homology detection; (ii) this increase is higher when the distributions are based on the scores to all known non-homologs of the subject rather than a small calibration subset of the database representatives; and (iii) these all known non-homolog distributions of scores for the subject make the dominant contribution to the improved performance: adding the calibration distribution of the query has a negligible additional effect.

CONCLUSION

The construction of distributions based on the complete sets of non-homologs for each subject is particularly relevant in the setting of structure prediction where the database consists of proteins with solved 3D structure (PDB, SCOP, CATH, etc.) and therefore structural relationships between proteins are known. These results point to a potential new direction in the development of more powerful methods for remote homology detection.

摘要

背景

基于-profile 的多序列比对比较是一种强大的方法,可用于检测远程蛋白质序列相似性,这对于推断和分析蛋白质结构、功能和进化至关重要。准确估计检测到的-profile 相似性的统计显著性对于该方法的进一步发展至关重要。在这里,我们分析了一种估计-profile 相似性统计显著性的新方法:明确考虑每个数据库模板(主体)的背景得分分布。

结果

使用一种简单的方案来组合和分析近似查询和主体分布,我们表明:(i)包含主体的背景分布可提高同源性检测的质量;(ii)当分布基于主体的所有已知非同源物的分数而不是数据库代表的一小部分校准子集时,这种增加更高;(iii)主体的这些所有已知非同源物的分数分布对提高性能做出了主要贡献:添加查询的校准分布几乎没有额外效果。

结论

对于由具有已解决 3D 结构的蛋白质(PDB、SCOP、CATH 等)组成的数据库,为每个主体构建基于所有非同源物集的分布特别相关,因此蛋白质之间的结构关系是已知的。这些结果指向开发更强大的远程同源性检测方法的一个新的潜在方向。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9232/3087343/007c44cc3206/1471-2105-10-399-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验