Suppr超能文献

CaML:化学信息机器学习利用结构和拓扑特征解释钙结合蛋白中蛋白质构象与钙离子之间的相互变化。

CaML: Chemistry-informed machine learning explains mutual changes between protein conformations and calcium ions in calcium-binding proteins using structural and topological features.

作者信息

Zhang Pengzhi, Nde Jules, Eliaz Yossi, Jennings Nathaniel, Cieplak Piotr, Cheung Margaret S

机构信息

Center for Bioinformatics and Computational Biology, Houston Methodist Research Institute, Houston, Texas, USA.

Department of Physics, University of Washington, Seattle, Washington, USA.

出版信息

Protein Sci. 2025 Feb;34(2):e70023. doi: 10.1002/pro.70023.

Abstract

Proteins' flexibility is a feature in communicating changes in cell signaling instigated by binding with secondary messengers, such as calcium ions, associated with the coordination of muscle contraction, neurotransmitter release, and gene expression. When binding with the disordered parts of a protein, calcium ions must balance their charge states with the shape of calcium-binding proteins and their versatile pool of partners depending on the circumstances they transmit. Accurately determining the ionic charges of those ions is essential for understanding their role in such processes. However, it is unclear whether the limited experimental data available can be effectively used to train models to accurately predict the charges of calcium-binding protein variants. Here, we developed a chemistry-informed, machine-learning algorithm that implements a game theoretic approach to explain the output of a machine-learning model without the prerequisite of an excessively large database for high-performance prediction of atomic charges. We used the ab initio electronic structure data representing calcium ions and the structures of the disordered segments of calcium-binding peptides with surrounding water molecules to train several explainable models. Network theory was used to extract the topological features of atomic interactions in the structurally complex data dictated by the coordination chemistry of a calcium ion, a potent indicator of its charge state in protein. Our design created a computational tool of CaML, which provided a framework of explainable machine learning model to annotate ionic charges of calcium ions in calcium-binding proteins in response to the chemical changes in an environment. Our framework will provide new insights into protein design for engineering functionality based on the limited size of scientific data in a genome space.

摘要

蛋白质的灵活性是其在与二级信使(如钙离子)结合引发细胞信号变化时所具备的一个特性,这与肌肉收缩、神经递质释放和基因表达的协调相关。当钙离子与蛋白质的无序部分结合时,它们必须根据所传递的具体情况,通过钙结合蛋白的形状及其多样的伙伴来平衡自身的电荷状态。准确确定这些离子的电荷对于理解它们在此类过程中的作用至关重要。然而,目前尚不清楚现有的有限实验数据能否有效地用于训练模型,以准确预测钙结合蛋白变体的电荷。在此,我们开发了一种基于化学知识的机器学习算法,该算法采用博弈论方法来解释机器学习模型的输出,而无需庞大的数据库作为高性能预测原子电荷的前提条件。我们使用表示钙离子的从头算电子结构数据以及带有周围水分子的钙结合肽无序片段的结构来训练多个可解释模型。网络理论被用于从由钙离子配位化学所决定的结构复杂数据中提取原子相互作用的拓扑特征,而钙离子的配位化学是其在蛋白质中电荷状态的有力指标。我们的设计创建了一个名为CaML的计算工具,它提供了一个可解释机器学习模型的框架,以根据环境中的化学变化来注释钙结合蛋白中钙离子的离子电荷。我们的框架将基于基因组空间中有限的科学数据规模,为基于工程功能的蛋白质设计提供新的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e2d6/11761698/c1e94e99dd83/PRO-34-e70023-g004.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验