Suppr超能文献

基于结构的机器学习方法预测蛋白质天冬酰胺脱酰胺作用

Protein asparagine deamidation prediction based on structures with machine learning methods.

作者信息

Jia Lei, Sun Yaxiong

机构信息

Amgen Inc., One Amgen Center Drive, Thousand Oaks, CA, United States of America.

出版信息

PLoS One. 2017 Jul 21;12(7):e0181347. doi: 10.1371/journal.pone.0181347. eCollection 2017.

Abstract

Chemical stability is a major concern in the development of protein therapeutics due to its impact on both efficacy and safety. Protein "hotspots" are amino acid residues that are subject to various chemical modifications, including deamidation, isomerization, glycosylation, oxidation etc. A more accurate prediction method for potential hotspot residues would allow their elimination or reduction as early as possible in the drug discovery process. In this work, we focus on prediction models for asparagine (Asn) deamidation. Sequence-based prediction method simply identifies the NG motif (amino acid asparagine followed by a glycine) to be liable to deamidation. It still dominates deamidation evaluation process in most pharmaceutical setup due to its convenience. However, the simple sequence-based method is less accurate and often causes over-engineering a protein. We introduce structure-based prediction models by mining available experimental and structural data of deamidated proteins. Our training set contains 194 Asn residues from 25 proteins that all have available high-resolution crystal structures. Experimentally measured deamidation half-life of Asn in penta-peptides as well as 3D structure-based properties, such as solvent exposure, crystallographic B-factors, local secondary structure and dihedral angles etc., were used to train prediction models with several machine learning algorithms. The prediction tools were cross-validated as well as tested with an external test data set. The random forest model had high enrichment in ranking deamidated residues higher than non-deamidated residues while effectively eliminated false positive predictions. It is possible that such quantitative protein structure-function relationship tools can also be applied to other protein hotspot predictions. In addition, we extensively discussed metrics being used to evaluate the performance of predicting unbalanced data sets such as the deamidation case.

摘要

由于化学稳定性对蛋白质治疗药物的疗效和安全性均有影响,因此它是蛋白质治疗药物研发中的一个主要关注点。蛋白质“热点”是指易发生各种化学修饰的氨基酸残基,包括脱酰胺、异构化、糖基化、氧化等。一种更准确的潜在热点残基预测方法将有助于在药物发现过程中尽早消除或减少这些残基。在这项工作中,我们专注于天冬酰胺(Asn)脱酰胺的预测模型。基于序列的预测方法只是简单地将NG基序(天冬酰胺氨基酸后接一个甘氨酸)识别为易于脱酰胺的基序。由于其便利性,它在大多数制药机构的脱酰胺评估过程中仍然占据主导地位。然而,这种简单的基于序列的方法准确性较低,并且常常导致对蛋白质的过度改造。我们通过挖掘脱酰胺蛋白质的现有实验和结构数据,引入了基于结构的预测模型。我们的训练集包含来自25种蛋白质的194个Asn残基,这些蛋白质均具有可用的高分辨率晶体结构。五肽中天冬酰胺的实验测量脱酰胺半衰期以及基于三维结构的特性,如溶剂暴露、晶体学B因子、局部二级结构和二面角等,被用于使用几种机器学习算法训练预测模型。这些预测工具经过了交叉验证,并使用外部测试数据集进行了测试。随机森林模型在将脱酰胺残基排在非脱酰胺残基之上方面具有较高的富集度,同时有效地消除了假阳性预测。这种定量的蛋白质结构-功能关系工具也有可能应用于其他蛋白质热点预测。此外,我们还广泛讨论了用于评估预测不平衡数据集(如脱酰胺情况)性能的指标。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验