Suppr超能文献

基于机器学习的蛋白质结构域序列变化导致的结构响应分解。

Decomposing Structural Response Due to Sequence Changes in Protein Domains with Machine Learning.

机构信息

Science for Life Laboratory, Solna, Sweden; Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden.

出版信息

J Mol Biol. 2020 Jul 24;432(16):4435-4446. doi: 10.1016/j.jmb.2020.05.021. Epub 2020 May 30.

Abstract

How protein domain structure changes in response to mutations is not well understood. Some mutations change the structure drastically, while most only result in small changes. To gain an understanding of this, we decompose the relationship between changes in domain sequence and structure using machine learning. We select pairs of evolutionarily related domains with a broad range of evolutionary distances. In contrast to earlier studies, we do not find a strictly linear relationship between sequence and structural changes. We train a random forest regressor that predicts the structural similarity between pairs with an average accuracy of 0.029 lDDT ( local Distance Difference Test) score, and a correlation coefficient of 0.92. Decomposing the feature importance shows that the domain length, or analogously, size is the most important feature. Our model enables assessing deviations in relative structural response, and thus prediction of evolutionary trajectories, in protein domains across evolution.

摘要

蛋白质结构域如何响应突变而发生变化还不太清楚。有些突变会使结构发生剧烈变化,而大多数突变只会导致微小的变化。为了了解这一点,我们使用机器学习方法对结构域序列和结构之间的变化关系进行了分解。我们选择了具有广泛进化距离的进化相关结构域对。与早期的研究不同,我们没有发现序列和结构变化之间存在严格的线性关系。我们训练了一个随机森林回归器,用它来预测结构域对之间的结构相似性,平均 lDDT(局部距离差异测试)分数为 0.029,相关系数为 0.92。对特征重要性的分解表明,结构域长度或类似的大小是最重要的特征。我们的模型可以评估蛋白质结构域在进化过程中相对结构响应的偏差,从而预测进化轨迹。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验