Suppr超能文献

多宝:通过整合进化信号和机器学习进行蛋白质结构域边界预测。

DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning.

机构信息

Department of Computer Science, University of Missouri, Columbia, MO 65211, USA.

出版信息

BMC Bioinformatics. 2011 Feb 1;12:43. doi: 10.1186/1471-2105-12-43.

Abstract

BACKGROUND

Accurate identification of protein domain boundaries is useful for protein structure determination and prediction. However, predicting protein domain boundaries from a sequence is still very challenging and largely unsolved.

RESULTS

We developed a new method to integrate the classification power of machine learning with evolutionary signals embedded in protein families in order to improve protein domain boundary prediction. The method first extracts putative domain boundary signals from a multiple sequence alignment between a query sequence and its homologs. The putative sites are then classified and scored by support vector machines in conjunction with input features such as sequence profiles, secondary structures, solvent accessibilities around the sites and their positions. The method was evaluated on a domain benchmark by 10-fold cross-validation and 60% of true domain boundaries can be recalled at a precision of 60%. The trade-off between the precision and recall can be adjusted according to specific needs by using different decision thresholds on the domain boundary scores assigned by the support vector machines.

CONCLUSIONS

The good prediction accuracy and the flexibility of selecting domain boundary sites at different precision and recall values make our method a useful tool for protein structure determination and modelling. The method is available at http://sysbio.rnet.missouri.edu/dobo/.

摘要

背景

准确识别蛋白质结构域边界对于蛋白质结构确定和预测非常有用。然而,从序列预测蛋白质结构域边界仍然极具挑战性,并且在很大程度上尚未解决。

结果

我们开发了一种新方法,将机器学习的分类能力与蛋白质家族中嵌入的进化信号相结合,以提高蛋白质结构域边界预测的准确性。该方法首先从查询序列与其同源序列之间的多重序列比对中提取潜在的结构域边界信号。然后,通过支持向量机与输入特征(如序列特征、二级结构、局部溶剂可及性和位置)结合,对潜在的结构域边界进行分类和评分。该方法在结构域基准上进行了 10 折交叉验证评估,在精度为 60%的情况下可以召回 60%的真实结构域边界。通过在支持向量机分配的结构域边界分数上使用不同的决策阈值,可以根据特定需求调整精度和召回率之间的权衡。

结论

良好的预测准确性和在不同精度和召回率下选择结构域边界位置的灵活性使我们的方法成为蛋白质结构确定和建模的有用工具。该方法可在 http://sysbio.rnet.missouri.edu/dobo/ 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5258/3036623/e14da0cd712e/1471-2105-12-43-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验