用于相关突变分析的最佳数据收集。

Optimal data collection for correlated mutation analysis.

作者信息

Ashkenazy Haim, Unger Ron, Kliger Yossef

机构信息

Compugen LTD, Tel Aviv 69512, Israel.

出版信息

Proteins. 2009 Feb 15;74(3):545-55. doi: 10.1002/prot.22168.

DOI:10.1002/prot.22168

PMID:18655065

Abstract

The main objective of correlated mutation analysis (CMA) is to predict intraprotein residue-residue interactions from sequence alone. Despite considerable progress in algorithms and computer capabilities, the performance of CMA methods remains quite low. Here we examine whether, and to what extent, the quality of CMA methods depends on the sequences that are included in the multiple sequence alignment (MSA). The results revealed a strong correlation between the number of homologs in an MSA and CMA prediction strength. Furthermore, many of the current methods include only orthologs in the MSA, we found that it is beneficial to include both orthologs and paralogs in the MSA. Remarkably, even remote homologs contribute to the improved accuracy. Based on our findings we put forward an automated data collection procedure, with a minimal coverage of 50% between the query protein and its orthologs and paralogs. This procedure improves accuracy even in the absence of manual curation. In this era of massive sequencing and exploding sequence data, our results suggest that correlated mutation-based methods have not reached their inherent performance limitations and that the role of CMA in structural biology is far from being fulfilled.

摘要

相关突变分析（CMA）的主要目标是仅从序列预测蛋白质内残基与残基之间的相互作用。尽管在算法和计算机性能方面取得了显著进展，但CMA方法的性能仍然相当低。在这里，我们研究CMA方法的质量是否以及在多大程度上取决于多序列比对（MSA）中包含的序列。结果显示，MSA中同源物的数量与CMA预测强度之间存在很强的相关性。此外，当前许多方法在MSA中仅包含直系同源物，我们发现将直系同源物和旁系同源物都包含在MSA中是有益的。值得注意的是，即使是远缘同源物也有助于提高准确性。基于我们的发现，我们提出了一种自动数据收集程序，查询蛋白与其直系同源物和旁系同源物之间的最小覆盖率为50%。即使在没有人工整理的情况下，该程序也能提高准确性。在这个大规模测序和序列数据爆炸的时代，我们的结果表明基于相关突变的方法尚未达到其固有的性能限制，并且CMA在结构生物学中的作用远未实现。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于相关突变分析的最佳数据收集。

Optimal data collection for correlated mutation analysis.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

用于相关突变分析的最佳数据收集。

Optimal data collection for correlated mutation analysis.

作者信息

机构信息

出版信息

相似文献

引用本文的文献