Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau, China.
Biomolecules. 2020 Apr 17;10(4):626. doi: 10.3390/biom10040626.
Protein structures play a very important role in biomedical research, especially in drug discovery and design, which require accurate protein structures in advance. However, experimental determinations of protein structure are prohibitively costly and time-consuming, and computational predictions of protein structures have not been perfected. Methods that assess the quality of protein models can help in selecting the most accurate candidates for further work. Driven by this demand, many structural bioinformatics laboratories have developed methods for estimating model accuracy (EMA). In recent years, EMA by machine learning (ML) have consistently ranked among the top-performing methods in the community-wide CASP challenge. Accordingly, we systematically review all the major ML-based EMA methods developed within the past ten years. The methods are grouped by their employed ML approach-support vector machine, artificial neural networks, ensemble learning, or Bayesian learning-and their significances are discussed from a methodology viewpoint. To orient the reader, we also briefly describe the background of EMA, including the CASP challenge and its evaluation metrics, and introduce the major ML/DL techniques. Overall, this review provides an introductory guide to modern research on protein quality assessment and directions for future research in this area.
蛋白质结构在生物医学研究中起着非常重要的作用,特别是在药物发现和设计中,这些都需要事先获得准确的蛋白质结构。然而,实验确定蛋白质结构的成本和时间都非常高,并且计算预测蛋白质结构还不够完善。评估蛋白质模型质量的方法可以帮助选择最准确的候选者进行进一步的研究。出于这种需求,许多结构生物信息学实验室已经开发了评估模型准确性的方法(EMA)。近年来,基于机器学习(ML)的 EMA 在 CASP 挑战赛中一直是表现最好的方法之一。因此,我们系统地回顾了过去十年中开发的所有主要基于 ML 的 EMA 方法。这些方法按其采用的 ML 方法(支持向量机、人工神经网络、集成学习或贝叶斯学习)进行分组,并从方法学的角度讨论了它们的重要性。为了让读者了解背景知识,我们还简要介绍了 EMA 的背景,包括 CASP 挑战赛及其评估指标,并介绍了主要的 ML/DL 技术。总的来说,这篇综述为蛋白质质量评估的现代研究提供了一个入门指南,并为该领域的未来研究指明了方向。