Suppr超能文献

计算机辅助合理蛋白质工程任务中蛋白质描述符的评估及其在SARS-CoV-2刺突糖蛋白特性预测中的应用。

Evaluation of protein descriptors in computer-aided rational protein engineering tasks and its application in property prediction in SARS-CoV-2 spike glycoprotein.

作者信息

Lim Hocheol, Jeon Hyeon-Nae, Lim Seungcheol, Jang Yuil, Kim Taehee, Cho Hyein, Pan Jae-Gu, No Kyoung Tai

机构信息

The Interdisciplinary Graduate Program in Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon, Republic of Korea.

Department of Biotechnology, Yonsei University, Seoul, Republic of Korea.

出版信息

Comput Struct Biotechnol J. 2022 Jan 31;20:788-798. doi: 10.1016/j.csbj.2022.01.027. eCollection 2022.

Abstract

The importance of protein engineering in the research and development of biopharmaceuticals and biomaterials has increased. Machine learning in computer-aided protein engineering can markedly reduce the experimental effort in identifying optimal sequences that satisfy the desired properties from a large number of possible protein sequences. To develop general protein descriptors for computer-aided protein engineering tasks, we devised new protein descriptors, one sequence-based descriptor (PCgrades), and three structure-based descriptors (PCspairs, 3D-SPIEs_5.4 Å, and 3D-SPIEs_8Å). While the PCgrades and PCspairs include general and statistical information in physicochemical properties in single and pairwise amino acids respectively, the 3D-SPIEs include specific and quantum-mechanical information with parameterized quantum mechanical calculations (FMO2-DFTB3/D/PCM). To evaluate the protein descriptors, we made prediction models with the new descriptors and previously developed descriptors for diverse protein datasets including protein expression and binding affinity change in SARS-CoV-2 spike glycoprotein. As a result, the newly devised descriptors showed a good performance in diverse datasets, in which the PCspairs showed the best performance ( for protein expression and for binding affinity). As a result, the newly devised descriptors showed a good performance in diverse datasets, in which the PCspairs showed the best performance. Similar approaches with those descriptors would be promising and useful if the prediction models are trained with sufficient quantitative experimental data from high-throughput assays for industrial enzymes or protein drugs.

摘要

蛋白质工程在生物制药和生物材料研发中的重要性日益增加。计算机辅助蛋白质工程中的机器学习可以显著减少从大量可能的蛋白质序列中识别满足所需特性的最佳序列的实验工作量。为了开发用于计算机辅助蛋白质工程任务的通用蛋白质描述符,我们设计了新的蛋白质描述符,一个基于序列的描述符(PCgrades)和三个基于结构的描述符(PCspairs、3D-SPIEs_5.4 Å和3D-SPIEs_8Å)。虽然PCgrades和PCspairs分别在单个和成对氨基酸的物理化学性质中包含一般和统计信息,但3D-SPIEs通过参数化量子力学计算(FMO2-DFTB3/D/PCM)包含特定和量子力学信息。为了评估蛋白质描述符,我们使用新描述符和先前开发的描述符为包括SARS-CoV-2刺突糖蛋白中的蛋白质表达和结合亲和力变化在内的各种蛋白质数据集建立了预测模型。结果,新设计的描述符在各种数据集中表现良好,其中PCspairs表现最佳(蛋白质表达方面为 ,结合亲和力方面为 )。因此,新设计的描述符在各种数据集中表现良好,其中PCspairs表现最佳。如果使用来自工业酶或蛋白质药物高通量测定的足够定量实验数据训练预测模型,采用这些描述符的类似方法将是有前途和有用的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0903/8841378/3ac2dd3ee9c8/ga1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验