GraphEGFR：基于分子图注意力机制和指纹的多任务和迁移学习，在数据稀缺的情况下提高了针对 EGFR 家族蛋白的抑制剂生物活性预测。

GraphEGFR: Multi-task and transfer learning based on molecular graph attention mechanism and fingerprints improving inhibitor bioactivity prediction for EGFR family proteins on data scarcity.

机构信息

School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology, Rayong, Thailand.

School of Information, Computer, and Communication Technology, Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani, Thailand.

出版信息

J Comput Chem. 2024 Sep 5;45(23):2001-2023. doi: 10.1002/jcc.27388. Epub 2024 May 7.

DOI:10.1002/jcc.27388

PMID:38713612

Abstract

The proteins within the human epidermal growth factor receptor (EGFR) family, members of the tyrosine kinase receptor family, play a pivotal role in the molecular mechanisms driving the development of various tumors. Tyrosine kinase inhibitors, key compounds in targeted therapy, encounter challenges in cancer treatment due to emerging drug resistance mutations. Consequently, machine learning has undergone significant evolution to address the challenges of cancer drug discovery related to EGFR family proteins. However, the application of deep learning in this area is hindered by inherent difficulties associated with small-scale data, particularly the risk of overfitting. Moreover, the design of a model architecture that facilitates learning through multi-task and transfer learning, coupled with appropriate molecular representation, poses substantial challenges. In this study, we introduce GraphEGFR, a deep learning regression model designed to enhance molecular representation and model architecture for predicting the bioactivity of inhibitors against both wild-type and mutant EGFR family proteins. GraphEGFR integrates a graph attention mechanism for molecular graphs with deep and convolutional neural networks for molecular fingerprints. We observed that GraphEGFR models employing multi-task and transfer learning strategies generally achieve predictive performance comparable to existing competitive methods. The integration of molecular graphs and fingerprints adeptly captures relationships between atoms and enables both global and local pattern recognition. We further validated potential multi-targeted inhibitors for wild-type and mutant HER1 kinases, exploring key amino acid residues through molecular dynamics simulations to understand molecular interactions. This predictive model offers a robust strategy that could significantly contribute to overcoming the challenges of developing deep learning models for drug discovery with limited data and exploring new frontiers in multi-targeted kinase drug discovery for EGFR family proteins.

摘要

人类表皮生长因子受体（EGFR）家族中的蛋白质是酪氨酸激酶受体家族的成员，在驱动各种肿瘤发展的分子机制中起着关键作用。酪氨酸激酶抑制剂是靶向治疗的关键化合物，但由于出现耐药性突变，它们在癌症治疗中遇到了挑战。因此，机器学习在解决与 EGFR 家族蛋白相关的癌症药物发现挑战方面经历了重大发展。然而，深度学习在该领域的应用受到与小数据集相关的固有困难的阻碍，特别是过拟合的风险。此外，设计一个便于通过多任务和迁移学习进行学习的模型架构，并结合适当的分子表示形式，也面临着巨大的挑战。在这项研究中，我们引入了 GraphEGFR，这是一种深度学习回归模型，旨在增强分子表示和模型架构，以预测针对野生型和突变型 EGFR 家族蛋白的抑制剂的生物活性。GraphEGFR 将用于分子图的图注意机制与用于分子指纹的深度和卷积神经网络集成在一起。我们观察到，采用多任务和迁移学习策略的 GraphEGFR 模型通常可以实现与现有竞争方法相当的预测性能。分子图和指纹的集成巧妙地捕捉了原子之间的关系，能够进行全局和局部模式识别。我们进一步验证了针对野生型和突变型 HER1 激酶的潜在多靶向抑制剂，通过分子动力学模拟探索关键氨基酸残基，以了解分子相互作用。这种预测模型提供了一种强大的策略，可以为克服数据有限的情况下开发深度学习模型以进行药物发现的挑战以及探索针对 EGFR 家族蛋白的多靶向激酶药物发现的新前沿做出重大贡献。