Tadepalli Sivani, Akhter Nasrin, Barbara Daniel, Shehu Amarda
IEEE Trans Nanobioscience. 2020 Jul;19(3):562-570. doi: 10.1109/TNB.2020.2990642. Epub 2020 Apr 27.
The three-dimensional structures populated by a protein molecule determine to a great extent its biological activities. The rich information encoded by protein structure on protein function continues to motivate the development of computational approaches for determining functionally-relevant structures. The majority of structures generated in silico are not relevant. Discriminating relevant/native protein structures from non-native ones is an outstanding challenge in computational structural biology. Inherently, this is a recognition problem that can be addressed under the umbrella of machine learning. In this paper, based on the premise that near-native structures are effectively anomalies, we build on the concept of anomaly detection in machine learning. We propose methods that automatically select relevant subsets, as well as methods that select a single structure to offer as prediction. Evaluations are carried out on benchmark datasets and demonstrate that the proposed methods advance the state of the art. The presented results motivate further building on and adapting concepts and techniques from machine learning to improve recognition of near-native structures in protein structure prediction.
蛋白质分子所呈现的三维结构在很大程度上决定了其生物学活性。蛋白质结构所编码的丰富信息与蛋白质功能之间的关系,持续推动着用于确定功能相关结构的计算方法的发展。计算机模拟生成的大多数结构并不相关。从非天然结构中区分出相关/天然蛋白质结构,是计算结构生物学中一项突出的挑战。本质上,这是一个识别问题,可以在机器学习的框架下解决。在本文中,基于接近天然的结构实际上就是异常值这一前提,我们以机器学习中的异常检测概念为基础展开研究。我们提出了自动选择相关子集的方法,以及选择单个结构作为预测结果的方法。在基准数据集上进行了评估,结果表明所提出的方法推动了该领域的技术发展。所呈现的结果促使我们进一步借鉴和应用机器学习的概念与技术,以改进蛋白质结构预测中对接近天然结构的识别。