Suppr超能文献

稀疏分组选择和功能相关残基分析在蛋白质状态识别中的应用。

Sparse group selection and analysis of function-related residue for protein-state recognition.

机构信息

Department of Management Science and Engineering, Tongji University, Shanghai, China.

Department of Industrial, Manufacturing and Systems Engineering, University of Texas at Arlington, Arlington, Texas, USA.

出版信息

J Comput Chem. 2022 Jul 30;43(20):1342-1354. doi: 10.1002/jcc.26937. Epub 2022 Jun 3.

Abstract

Machine learning methods have helped to advance wide range of scientific and technological field in recent years, including computational chemistry. As the chemical systems could become complex with high dimension, feature selection could be critical but challenging to develop reliable machine learning based prediction models, especially for proteins as bio-macromolecules. In this study, we applied sparse group lasso (SGL) method as a general feature selection method to develop classification model for an allosteric protein in different functional states. This results into a much improved model with comparable accuracy (Acc) and only 28 selected features comparing to 289 selected features from a previous study. The Acc achieves 91.50% with 1936 selected feature, which is far higher than that of baseline methods. In addition, grouping protein amino acids into secondary structures provides additional interpretability of the selected features. The selected features are verified as associated with key allosteric residues through comparison with both experimental and computational works about the model protein, and demonstrate the effectiveness and necessity of applying rigorous feature selection and evaluation methods on complex chemical systems.

摘要

近年来,机器学习方法在包括计算化学在内的广泛科学技术领域取得了进展。由于化学系统可能变得复杂,维度高,特征选择对于开发可靠的基于机器学习的预测模型至关重要,但具有挑战性,特别是对于生物大分子蛋白质而言。在这项研究中,我们应用稀疏组套索(SGL)方法作为一般特征选择方法,为不同功能状态的别构蛋白开发分类模型。与之前的研究中从 289 个特征中选择相比,这得到了一个改进很多的模型,准确性(Acc)相当,只有 28 个特征。Acc 达到 91.50%,选择了 1936 个特征,远高于基线方法。此外,将蛋白质氨基酸分组为二级结构为所选特征提供了额外的可解释性。通过与模型蛋白质的实验和计算工作进行比较,对所选特征进行了验证,这些特征与关键别构残基相关,并证明了在复杂化学系统中应用严格的特征选择和评估方法的有效性和必要性。

相似文献

6
A Survey on Sparse Learning Models for Feature Selection.基于稀疏学习模型的特征选择研究综述
IEEE Trans Cybern. 2022 Mar;52(3):1642-1660. doi: 10.1109/TCYB.2020.2982445. Epub 2022 Mar 11.

引用本文的文献

本文引用的文献

3
2D-IR Spectroscopy of an AHA Labeled Photoswitchable PDZ2 Domain.一种AHA标记的光开关PDZ2结构域的二维红外光谱
J Phys Chem A. 2017 Dec 14;121(49):9435-9445. doi: 10.1021/acs.jpca.7b09675. Epub 2017 Dec 4.
4
Structure-based prediction of protein allostery.基于结构的蛋白质变构预测。
Curr Opin Struct Biol. 2018 Jun;50:1-8. doi: 10.1016/j.sbi.2017.10.002. Epub 2017 Nov 5.
8
The role of protein dynamics in the evolution of new enzyme function.蛋白质动力学在新酶功能进化中的作用。
Nat Chem Biol. 2016 Nov;12(11):944-950. doi: 10.1038/nchembio.2175. Epub 2016 Sep 12.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验