Suppr超能文献

基于自动编码器的特征提取和监督学习在蛋白质诱饵选择中的评估。

Evaluating Autoencoder-Based Featurization and Supervised Learning for Protein Decoy Selection.

机构信息

Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.

Center for Advancing Human-Machine Partnerships, George Mason University, Fairfax, VA 22030, USA.

出版信息

Molecules. 2020 Mar 4;25(5):1146. doi: 10.3390/molecules25051146.

Abstract

Rapid growth in molecular structure data is renewing interest in featurizing structure. Featurizations that retain information on biological activity are particularly sought for protein molecules, where decades of research have shown that indeed structure encodes function. Research on featurization of protein structure is active, but here we assess the promise of autoencoders. Motivated by rapid progress in neural network research, we investigate and evaluate autoencoders on yielding linear and nonlinear featurizations of protein tertiary structures. An additional reason we focus on autoencoders as the engine to obtain featurizations is the versatility of their architectures and the ease with which changes to architecture yield linear versus nonlinear features. While open-source neural network libraries, such as Keras, which we employ here, greatly facilitate constructing, training, and evaluating autoencoder architectures and conducting model search, autoencoders have not yet gained popularity in the structure biology community. Here we demonstrate their utility in a practical context. Employing autoencoder-based featurizations, we address the classic problem of decoy selection in protein structure prediction. Utilizing off-the-shelf supervised learning methods, we demonstrate that the featurizations are indeed meaningful and allow detecting active tertiary structures, thus opening the way for further avenues of research.

摘要

分子结构数据的快速增长正在重新激发人们对结构特征化的兴趣。对于蛋白质分子来说,特别需要保留生物活性信息的特征化,因为几十年来的研究表明,结构确实编码了功能。蛋白质结构特征化的研究非常活跃,但在这里,我们评估了自动编码器的前景。受神经网络研究快速进展的推动,我们研究并评估了自动编码器在产生蛋白质三级结构的线性和非线性特征化方面的表现。我们之所以选择自动编码器作为获取特征化的引擎,另一个原因是它们的架构具有多功能性,并且易于通过改变架构来生成线性和非线性特征。虽然开源神经网络库(例如我们在这里使用的 Keras)极大地简化了自动编码器架构的构建、训练和评估以及模型搜索,但自动编码器在结构生物学界尚未流行起来。在这里,我们在实际背景下展示了它们的实用性。我们利用基于自动编码器的特征化方法来解决蛋白质结构预测中经典的诱饵选择问题。我们利用现成的监督学习方法,证明了这些特征化确实具有意义,并能够检测到活性的三级结构,从而为进一步的研究开辟了道路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7258/7179114/63c3f4045185/molecules-25-01146-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验