Suppr超能文献

用于氨基酸环境相似性分析的3D深度卷积神经网络。

3D deep convolutional neural networks for amino acid environment similarity analysis.

作者信息

Torng Wen, Altman Russ B

机构信息

Deparment of Bioengineering, Stanford University, Stanford, CA, 94305, USA.

Department of Genetics, Stanford University, Stanford, CA, 94305, USA.

出版信息

BMC Bioinformatics. 2017 Jun 14;18(1):302. doi: 10.1186/s12859-017-1702-0.

Abstract

BACKGROUND

Central to protein biology is the understanding of how structural elements give rise to observed function. The surfeit of protein structural data enables development of computational methods to systematically derive rules governing structural-functional relationships. However, performance of these methods depends critically on the choice of protein structural representation. Most current methods rely on features that are manually selected based on knowledge about protein structures. These are often general-purpose but not optimized for the specific application of interest. In this paper, we present a general framework that applies 3D convolutional neural network (3DCNN) technology to structure-based protein analysis. The framework automatically extracts task-specific features from the raw atom distribution, driven by supervised labels. As a pilot study, we use our network to analyze local protein microenvironments surrounding the 20 amino acids, and predict the amino acids most compatible with environments within a protein structure. To further validate the power of our method, we construct two amino acid substitution matrices from the prediction statistics and use them to predict effects of mutations in T4 lysozyme structures.

RESULTS

Our deep 3DCNN achieves a two-fold increase in prediction accuracy compared to models that employ conventional hand-engineered features and successfully recapitulates known information about similar and different microenvironments. Models built from our predictions and substitution matrices achieve an 85% accuracy predicting outcomes of the T4 lysozyme mutation variants. Our substitution matrices contain rich information relevant to mutation analysis compared to well-established substitution matrices. Finally, we present a visualization method to inspect the individual contributions of each atom to the classification decisions.

CONCLUSIONS

End-to-end trained deep learning networks consistently outperform methods using hand-engineered features, suggesting that the 3DCNN framework is well suited for analysis of protein microenvironments and may be useful for other protein structural analyses.

摘要

背景

蛋白质生物学的核心是理解结构元件如何产生所观察到的功能。丰富的蛋白质结构数据使得开发计算方法以系统地推导结构 - 功能关系的规则成为可能。然而,这些方法的性能关键取决于蛋白质结构表示的选择。目前大多数方法依赖于基于蛋白质结构知识手动选择的特征。这些特征通常是通用的,但并非针对感兴趣的特定应用进行优化。在本文中,我们提出了一个通用框架,该框架将3D卷积神经网络(3DCNN)技术应用于基于结构的蛋白质分析。该框架由监督标签驱动,从原始原子分布中自动提取特定任务的特征。作为一项初步研究,我们使用我们的网络分析20种氨基酸周围的局部蛋白质微环境,并预测与蛋白质结构内环境最兼容的氨基酸。为了进一步验证我们方法的能力,我们根据预测统计构建了两个氨基酸替换矩阵,并使用它们来预测T4溶菌酶结构中突变的影响。

结果

与采用传统手工设计特征的模型相比,我们的深度3DCNN在预测准确性上提高了两倍,并成功概括了关于相似和不同微环境的已知信息。基于我们的预测和替换矩阵构建的模型在预测T4溶菌酶突变变体的结果时准确率达到85%。与成熟的替换矩阵相比,我们的替换矩阵包含与突变分析相关的丰富信息。最后,我们提出了一种可视化方法来检查每个原子对分类决策的个体贡献。

结论

端到端训练的深度学习网络始终优于使用手工设计特征的方法,这表明3DCNN框架非常适合蛋白质微环境分析,并且可能对其他蛋白质结构分析有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eeab/5472009/d1d23ed0dd80/12859_2017_1702_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验