用于氨基酸环境相似性分析的3D深度卷积神经网络。

3D deep convolutional neural networks for amino acid environment similarity analysis.

作者信息

Torng Wen, Altman Russ B

机构信息

Deparment of Bioengineering, Stanford University, Stanford, CA, 94305, USA.

Department of Genetics, Stanford University, Stanford, CA, 94305, USA.

出版信息

BMC Bioinformatics. 2017 Jun 14;18(1):302. doi: 10.1186/s12859-017-1702-0.

DOI:10.1186/s12859-017-1702-0

PMID:28615003

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5472009/

Abstract

BACKGROUND

Central to protein biology is the understanding of how structural elements give rise to observed function. The surfeit of protein structural data enables development of computational methods to systematically derive rules governing structural-functional relationships. However, performance of these methods depends critically on the choice of protein structural representation. Most current methods rely on features that are manually selected based on knowledge about protein structures. These are often general-purpose but not optimized for the specific application of interest. In this paper, we present a general framework that applies 3D convolutional neural network (3DCNN) technology to structure-based protein analysis. The framework automatically extracts task-specific features from the raw atom distribution, driven by supervised labels. As a pilot study, we use our network to analyze local protein microenvironments surrounding the 20 amino acids, and predict the amino acids most compatible with environments within a protein structure. To further validate the power of our method, we construct two amino acid substitution matrices from the prediction statistics and use them to predict effects of mutations in T4 lysozyme structures.

RESULTS

Our deep 3DCNN achieves a two-fold increase in prediction accuracy compared to models that employ conventional hand-engineered features and successfully recapitulates known information about similar and different microenvironments. Models built from our predictions and substitution matrices achieve an 85% accuracy predicting outcomes of the T4 lysozyme mutation variants. Our substitution matrices contain rich information relevant to mutation analysis compared to well-established substitution matrices. Finally, we present a visualization method to inspect the individual contributions of each atom to the classification decisions.

CONCLUSIONS

End-to-end trained deep learning networks consistently outperform methods using hand-engineered features, suggesting that the 3DCNN framework is well suited for analysis of protein microenvironments and may be useful for other protein structural analyses.

摘要

背景

蛋白质生物学的核心是理解结构元件如何产生所观察到的功能。丰富的蛋白质结构数据使得开发计算方法以系统地推导结构 - 功能关系的规则成为可能。然而，这些方法的性能关键取决于蛋白质结构表示的选择。目前大多数方法依赖于基于蛋白质结构知识手动选择的特征。这些特征通常是通用的，但并非针对感兴趣的特定应用进行优化。在本文中，我们提出了一个通用框架，该框架将3D卷积神经网络（3DCNN）技术应用于基于结构的蛋白质分析。该框架由监督标签驱动，从原始原子分布中自动提取特定任务的特征。作为一项初步研究，我们使用我们的网络分析20种氨基酸周围的局部蛋白质微环境，并预测与蛋白质结构内环境最兼容的氨基酸。为了进一步验证我们方法的能力，我们根据预测统计构建了两个氨基酸替换矩阵，并使用它们来预测T4溶菌酶结构中突变的影响。

结果

与采用传统手工设计特征的模型相比，我们的深度3DCNN在预测准确性上提高了两倍，并成功概括了关于相似和不同微环境的已知信息。基于我们的预测和替换矩阵构建的模型在预测T4溶菌酶突变变体的结果时准确率达到85%。与成熟的替换矩阵相比，我们的替换矩阵包含与突变分析相关的丰富信息。最后，我们提出了一种可视化方法来检查每个原子对分类决策的个体贡献。

结论

端到端训练的深度学习网络始终优于使用手工设计特征的方法，这表明3DCNN框架非常适合蛋白质微环境分析，并且可能对其他蛋白质结构分析有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eeab/5472009/d1d23ed0dd80/12859_2017_1702_Fig1_HTML.jpg

相似文献

3D deep convolutional neural networks for amino acid environment similarity analysis.用于氨基酸环境相似性分析的3D深度卷积神经网络。

BMC Bioinformatics. 2017 Jun 14;18(1):302. doi: 10.1186/s12859-017-1702-0.

High precision protein functional site detection using 3D convolutional neural networks.利用 3D 卷积神经网络进行高精度蛋白质功能位点检测。

Bioinformatics. 2019 May 1;35(9):1503-1512. doi: 10.1093/bioinformatics/bty813.

Hierarchical Recurrent Neural Hashing for Image Retrieval With Hierarchical Convolutional Features.基于层次卷积特征的层次递归神经网络哈希图像检索

IEEE Trans Image Process. 2018;27(1):106-120. doi: 10.1109/TIP.2017.2755766.

Multimodal deep representation learning for protein interaction identification and protein family classification.基于多模态深度表示学习的蛋白质相互作用识别和蛋白质家族分类。

BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):531. doi: 10.1186/s12859-019-3084-y.

Deep convolutional networks for quality assessment of protein folds.深度卷积神经网络在蛋白质折叠质量评估中的应用。

Bioinformatics. 2018 Dec 1;34(23):4046-4053. doi: 10.1093/bioinformatics/bty494.

Protein model accuracy estimation based on local structure quality assessment using 3D convolutional neural network.基于局部结构质量评估的 3D 卷积神经网络的蛋白质模型精度估计。

PLoS One. 2019 Sep 5;14(9):e0221347. doi: 10.1371/journal.pone.0221347. eCollection 2019.

Classification of alkaloids according to the starting substances of their biosynthetic pathways using graph convolutional neural networks.基于生物合成途径起始物质的生物碱分类：使用图卷积神经网络。

BMC Bioinformatics. 2019 Jul 9;20(1):380. doi: 10.1186/s12859-019-2963-6.

DeepACLSTM: deep asymmetric convolutional long short-term memory neural models for protein secondary structure prediction.DeepACLSTM：用于蛋白质二级结构预测的深度非对称卷积长短时记忆神经模型。

BMC Bioinformatics. 2019 Jun 17;20(1):341. doi: 10.1186/s12859-019-2940-0.

Depth dependent amino acid substitution matrices and their use in predicting deleterious mutations.深度依赖型氨基酸替换矩阵及其在预测有害突变中的应用。

Prog Biophys Mol Biol. 2017 Sep;128:14-23. doi: 10.1016/j.pbiomolbio.2017.02.004. Epub 2017 Feb 15.

Deep convolutional neural networks for annotating gene expression patterns in the mouse brain.用于注释小鼠大脑中基因表达模式的深度卷积神经网络。

BMC Bioinformatics. 2015 May 7;16:147. doi: 10.1186/s12859-015-0553-9.

引用本文的文献

Artificial intelligence in antibody design and development: harnessing the power of computational approaches.人工智能在抗体设计与开发中的应用：利用计算方法的力量

Med Biol Eng Comput. 2025 Sep 1. doi: 10.1007/s11517-025-03429-4.

Deep learning-enhanced clustering and classification of protein molecule tertiary structures using weighted distance matrices.利用加权距离矩阵的深度学习增强蛋白质分子三级结构的聚类和分类

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf331.

A systematic evaluation of the language-of-viral-escape model using multiple machine learning frameworks.使用多个机器学习框架对病毒逃逸模型语言进行的系统评估。

J R Soc Interface. 2025 Apr;22(225):20240598. doi: 10.1098/rsif.2024.0598. Epub 2025 Apr 30.

CYCLICCAE: A CONFORMATIONAL AUTOENCODER FOR EFFICIENT HETEROCHIRAL MACROCYCLIC BACKBONE SAMPLING.CYCLICCAE：一种用于高效异手性大环骨架采样的构象自动编码器。

bioRxiv. 2025 Feb 27:2025.02.21.639569. doi: 10.1101/2025.02.21.639569.

PROPERMAB: an integrative framework for prediction of antibody developability using machine learning.PROPERMAB：一种使用机器学习预测抗体可开发性的综合框架。

MAbs. 2025 Dec;17(1):2474521. doi: 10.1080/19420862.2025.2474521. Epub 2025 Mar 5.

Self-supervised machine learning methods for protein design improve sampling but not the identification of high-fitness variants.用于蛋白质设计的自监督机器学习方法可改善采样，但无法识别高适应性变体。

Sci Adv. 2025 Feb 14;11(7):eadr7338. doi: 10.1126/sciadv.adr7338. Epub 2025 Feb 12.

DeepNose: An Equivariant Convolutional Neural Network Predictive Of Human Olfactory Percepts.深度嗅觉：一种可预测人类嗅觉感知的等变卷积神经网络

ArXiv. 2024 Dec 11:arXiv:2412.08747v1.

Efficacious human metapneumovirus vaccine based on AI-guided engineering of a closed prefusion trimer.基于人工智能指导的封闭前融合三聚体工程的有效人类偏肺病毒疫苗。

Nat Commun. 2024 Jul 25;15(1):6270. doi: 10.1038/s41467-024-50659-5.

Implications of Artificial Intelligence in Addressing Antimicrobial Resistance: Innovations, Global Challenges, and Healthcare's Future.人工智能在应对抗菌药物耐药性方面的影响：创新、全球挑战与医疗保健的未来。

Antibiotics (Basel). 2024 May 29;13(6):502. doi: 10.3390/antibiotics13060502.

Machine Learning and Deep Learning in Synthetic Biology: Key Architectures, Applications, and Challenges.合成生物学中的机器学习与深度学习：关键架构、应用及挑战

ACS Omega. 2024 Feb 19;9(9):9921-9945. doi: 10.1021/acsomega.3c05913. eCollection 2024 Mar 5.

本文引用的文献

SCOPe: Manual Curation and Artifact Removal in the Structural Classification of Proteins - extended Database.SCOPe：蛋白质结构分类中的人工整理与伪迹去除——扩展数据库

J Mol Biol. 2017 Feb 3;429(3):348-355. doi: 10.1016/j.jmb.2016.11.023. Epub 2016 Nov 30.

Molecular graph convolutions: moving beyond fingerprints.分子图卷积：超越指纹图谱

J Comput Aided Mol Des. 2016 Aug;30(8):595-608. doi: 10.1007/s10822-016-9938-8. Epub 2016 Aug 24.

Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records.深度患者：一种从电子健康记录中预测患者未来的无监督表示。

Sci Rep. 2016 May 17;6:26094. doi: 10.1038/srep26094.

Predicting effects of noncoding variants with deep learning-based sequence model.使用基于深度学习的序列模型预测非编码变异的影响。

Nat Methods. 2015 Oct;12(10):931-4. doi: 10.1038/nmeth.3547. Epub 2015 Aug 24.

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning.通过深度学习预测 DNA 和 RNA 结合蛋白的序列特异性。

Nat Biotechnol. 2015 Aug;33(8):831-8. doi: 10.1038/nbt.3300. Epub 2015 Jul 27.

Deep learning.深度学习。

Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.

UniProt: a hub for protein information.通用蛋白质数据库（UniProt）：蛋白质信息中心。

Nucleic Acids Res. 2015 Jan;43(Database issue):D204-12. doi: 10.1093/nar/gku989. Epub 2014 Oct 27.

Knowledge-based fragment binding prediction.基于知识的片段结合预测。

PLoS Comput Biol. 2014 Apr 24;10(4):e1003589. doi: 10.1371/journal.pcbi.1003589. eCollection 2014 Apr.

High precision prediction of functional sites in protein structures.蛋白质结构中功能位点的高精度预测。

PLoS One. 2014 Mar 14;9(3):e91240. doi: 10.1371/journal.pone.0091240. eCollection 2014.

Representation learning: a review and new perspectives.表示学习：综述与新视角。

IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828. doi: 10.1109/TPAMI.2013.50.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于氨基酸环境相似性分析的3D深度卷积神经网络。

3D deep convolutional neural networks for amino acid environment similarity analysis.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献