计算机辅助合理蛋白质工程任务中蛋白质描述符的评估及其在SARS-CoV-2刺突糖蛋白特性预测中的应用。

Evaluation of protein descriptors in computer-aided rational protein engineering tasks and its application in property prediction in SARS-CoV-2 spike glycoprotein.

作者信息

Lim Hocheol, Jeon Hyeon-Nae, Lim Seungcheol, Jang Yuil, Kim Taehee, Cho Hyein, Pan Jae-Gu, No Kyoung Tai

机构信息

The Interdisciplinary Graduate Program in Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon, Republic of Korea.

Department of Biotechnology, Yonsei University, Seoul, Republic of Korea.

出版信息

Comput Struct Biotechnol J. 2022 Jan 31;20:788-798. doi: 10.1016/j.csbj.2022.01.027. eCollection 2022.

DOI:10.1016/j.csbj.2022.01.027

PMID:35222841

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8841378/

Abstract

The importance of protein engineering in the research and development of biopharmaceuticals and biomaterials has increased. Machine learning in computer-aided protein engineering can markedly reduce the experimental effort in identifying optimal sequences that satisfy the desired properties from a large number of possible protein sequences. To develop general protein descriptors for computer-aided protein engineering tasks, we devised new protein descriptors, one sequence-based descriptor (PCgrades), and three structure-based descriptors (PCspairs, 3D-SPIEs_5.4 Å, and 3D-SPIEs_8Å). While the PCgrades and PCspairs include general and statistical information in physicochemical properties in single and pairwise amino acids respectively, the 3D-SPIEs include specific and quantum-mechanical information with parameterized quantum mechanical calculations (FMO2-DFTB3/D/PCM). To evaluate the protein descriptors, we made prediction models with the new descriptors and previously developed descriptors for diverse protein datasets including protein expression and binding affinity change in SARS-CoV-2 spike glycoprotein. As a result, the newly devised descriptors showed a good performance in diverse datasets, in which the PCspairs showed the best performance ( for protein expression and for binding affinity). As a result, the newly devised descriptors showed a good performance in diverse datasets, in which the PCspairs showed the best performance. Similar approaches with those descriptors would be promising and useful if the prediction models are trained with sufficient quantitative experimental data from high-throughput assays for industrial enzymes or protein drugs.

摘要

蛋白质工程在生物制药和生物材料研发中的重要性日益增加。计算机辅助蛋白质工程中的机器学习可以显著减少从大量可能的蛋白质序列中识别满足所需特性的最佳序列的实验工作量。为了开发用于计算机辅助蛋白质工程任务的通用蛋白质描述符，我们设计了新的蛋白质描述符，一个基于序列的描述符（PCgrades）和三个基于结构的描述符（PCspairs、3D-SPIEs_5.4 Å和3D-SPIEs_8Å）。虽然PCgrades和PCspairs分别在单个和成对氨基酸的物理化学性质中包含一般和统计信息，但3D-SPIEs通过参数化量子力学计算（FMO2-DFTB3/D/PCM）包含特定和量子力学信息。为了评估蛋白质描述符，我们使用新描述符和先前开发的描述符为包括SARS-CoV-2刺突糖蛋白中的蛋白质表达和结合亲和力变化在内的各种蛋白质数据集建立了预测模型。结果，新设计的描述符在各种数据集中表现良好，其中PCspairs表现最佳（蛋白质表达方面为，结合亲和力方面为）。因此，新设计的描述符在各种数据集中表现良好，其中PCspairs表现最佳。如果使用来自工业酶或蛋白质药物高通量测定的足够定量实验数据训练预测模型，采用这些描述符的类似方法将是有前途和有用的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0903/8841378/3ac2dd3ee9c8/ga1.jpg

相似文献

Evaluation of protein descriptors in computer-aided rational protein engineering tasks and its application in property prediction in SARS-CoV-2 spike glycoprotein.计算机辅助合理蛋白质工程任务中蛋白质描述符的评估及其在SARS-CoV-2刺突糖蛋白特性预测中的应用。

Comput Struct Biotechnol J. 2022 Jan 31;20:788-798. doi: 10.1016/j.csbj.2022.01.027. eCollection 2022.

Descriptor-augmented machine learning for enzyme-chemical interaction predictions.用于酶-化学相互作用预测的描述符增强机器学习

Synth Syst Biotechnol. 2024 Feb 28;9(2):259-268. doi: 10.1016/j.synbio.2024.02.006. eCollection 2024 Jun.

Importance of Engineered and Learned Molecular Representations in Predicting Organic Reactivity, Selectivity, and Chemical Properties.在预测有机反应性、选择性和化学性质方面，工程化和学习的分子表示的重要性。

Acc Chem Res. 2021 Feb 16;54(4):827-836. doi: 10.1021/acs.accounts.0c00745. Epub 2021 Feb 3.

Deep Dive into Machine Learning Models for Protein Engineering.深入研究蛋白质工程的机器学习模型。

J Chem Inf Model. 2020 Jun 22;60(6):2773-2790. doi: 10.1021/acs.jcim.0c00073. Epub 2020 May 5.

Coding of amino acids by texture descriptors.基于纹理特征的氨基酸编码。

Artif Intell Med. 2010 Jan;48(1):43-50. doi: 10.1016/j.artmed.2009.10.001. Epub 2009 Nov 4.

Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors.基于机器学习反应表示和即时量子力学描述符的区域选择性预测。

Chem Sci. 2020 Dec 22;12(6):2198-2208. doi: 10.1039/d0sc04823b.

Externally predictive quantitative modeling of supercooled liquid vapor pressure of polychlorinated-naphthalenes through electron-correlation based quantum-mechanical descriptors.通过基于电子相关的量子力学描述符对外推预测多氯代萘的过冷液体蒸气压。

Chemosphere. 2014 Jan;95:448-54. doi: 10.1016/j.chemosphere.2013.09.093. Epub 2013 Oct 26.

An Ensemble Structure and Physicochemical (SPOC) Descriptor for Machine-Learning Prediction of Chemical Reaction and Molecular Properties.用于机器学习预测化学反应和分子性质的集成结构和物理化学（SPOC）描述符。

Chemphyschem. 2022 Jul 19;23(14):e202200255. doi: 10.1002/cphc.202200255. Epub 2022 May 19.

Temporal-Geographical Dispersion of SARS-CoV-2 Spike Glycoprotein Variant Lineages and Their Functional Prediction Using Approach.利用方法预测 SARS-CoV-2 刺突糖蛋白变异株的时空地理分布及其功能。

mBio. 2021 Oct 26;12(5):e0268721. doi: 10.1128/mBio.02687-21.

Improved Prediction of Blood-Brain Barrier Permeability Through Machine Learning with Combined Use of Molecular Property-Based Descriptors and Fingerprints.通过机器学习结合分子性质基描述符和指纹提高血脑屏障通透性的预测。

AAPS J. 2018 Mar 21;20(3):54. doi: 10.1208/s12248-018-0215-8.

引用本文的文献

Machine Learning Integrating Protein Structure, Sequence, and Dynamics to Predict the Enzyme Activity of Bovine Enterokinase Variants.机器学习整合蛋白质结构、序列和动力学预测牛肠激酶变体的酶活性。

J Chem Inf Model. 2024 Apr 8;64(7):2681-2694. doi: 10.1021/acs.jcim.3c00999. Epub 2024 Feb 22.

Fragment molecular orbital-based variational quantum eigensolver for quantum chemistry in the age of quantum computing.量子计算时代基于片段分子轨道的变分量子本征求解器用于量子化学。

Sci Rep. 2024 Jan 29;14(1):2422. doi: 10.1038/s41598-024-52926-3.

An overview of descriptors to capture protein properties - Tools and perspectives in the context of QSAR modeling.用于描述蛋白质特性的描述符概述——定量构效关系建模背景下的工具与展望

Comput Struct Biotechnol J. 2023 May 24;21:3234-3247. doi: 10.1016/j.csbj.2023.05.022. eCollection 2023.

Reversal of the unique Q493R mutation increases the affinity of Omicron S1-RBD for ACE2.独特的Q493R突变的逆转增加了奥密克戎S1-RBD对ACE2的亲和力。

Comput Struct Biotechnol J. 2023 Feb 13;21:1966-1977. doi: 10.1016/j.csbj.2023.02.019. eCollection 2023.

Prediction of polyreactive and nonspecific single-chain fragment variables through structural biochemical features and protein language-based descriptors.通过结构生化特征和基于蛋白质语言的描述符预测多反应性和非特异性单链片段变量。

BMC Bioinformatics. 2022 Dec 5;23(1):520. doi: 10.1186/s12859-022-05010-4.

本文引用的文献

Rapid Assessment of Binding Affinity of SARS-COV-2 Spike Protein to the Human Angiotensin-Converting Enzyme 2 Receptor and to Neutralizing Biomolecules Based on Computer Simulations.基于计算机模拟的 SARS-CoV-2 刺突蛋白与人血管紧张素转化酶 2 受体和中和生物分子结合亲和力的快速评估。

Front Immunol. 2021 Nov 11;12:730099. doi: 10.3389/fimmu.2021.730099. eCollection 2021.

Computational prediction of the effect of amino acid changes on the binding affinity between SARS-CoV-2 spike RBD and human ACE2.计算预测氨基酸变化对 SARS-CoV-2 刺突 RBD 与人 ACE2 结合亲和力的影响。

Proc Natl Acad Sci U S A. 2021 Oct 19;118(42). doi: 10.1073/pnas.2106480118.

Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。

Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.

A topology-based network tree for the prediction of protein-protein binding affinity changes following mutation.一种基于拓扑结构的网络树，用于预测突变后蛋白质-蛋白质结合亲和力的变化。

Nat Mach Intell. 2020;2(2):116-123. doi: 10.1038/s42256-020-0149-6. Epub 2020 Feb 14.

Structure-based protein function prediction using graph convolutional networks.基于结构的蛋白质功能预测使用图卷积网络。

Nat Commun. 2021 May 26;12(1):3168. doi: 10.1038/s41467-021-23303-9.

On the origin and evolution of SARS-CoV-2.关于严重急性呼吸综合征冠状病毒2（SARS-CoV-2）的起源与进化

Exp Mol Med. 2021 Apr;53(4):537-547. doi: 10.1038/s12276-021-00604-z. Epub 2021 Apr 16.

Low-N protein engineering with data-efficient deep learning.低蛋白工程与数据高效深度学习。

Nat Methods. 2021 Apr;18(4):389-396. doi: 10.1038/s41592-021-01100-y. Epub 2021 Apr 7.

Diversity-Oriented Enzymatic Synthesis of Cyclopropane Building Blocks.面向多样性的环丙烷结构单元的酶促合成

ACS Catal. 2020 Jul 2;10(13):7112-7116. doi: 10.1021/acscatal.0c01888. Epub 2020 Jun 4.

Cryo-EM Structures of SARS-CoV-2 Spike without and with ACE2 Reveal a pH-Dependent Switch to Mediate Endosomal Positioning of Receptor-Binding Domains.SARS-CoV-2 刺突蛋白无 ACE2 和有 ACE2 的冷冻电镜结构揭示了一种 pH 依赖性开关，可介导受体结合结构域在内涵体中的定位。

Cell Host Microbe. 2020 Dec 9;28(6):867-879.e5. doi: 10.1016/j.chom.2020.11.004. Epub 2020 Nov 17.

Hot spot profiles of SARS-CoV-2 and human ACE2 receptor protein protein interaction obtained by density functional tight binding fragment molecular orbital method.通过密度泛函紧束缚片段分子轨道方法获得的 SARS-CoV-2 病毒和人血管紧张素转换酶 2 受体蛋白相互作用的热点图谱。

Sci Rep. 2020 Oct 8;10(1):16862. doi: 10.1038/s41598-020-73820-8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

计算机辅助合理蛋白质工程任务中蛋白质描述符的评估及其在SARS-CoV-2刺突糖蛋白特性预测中的应用。

Evaluation of protein descriptors in computer-aided rational protein engineering tasks and its application in property prediction in SARS-CoV-2 spike glycoprotein.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献