Hansson Frederik G, Madsen Niklas Gesmar, Hansen Lea G, Jakočiūnas Tadas, Lengger Bettina, Keasling Jay D, Jensen Michael K, Acevedo-Rocha Carlos G, Jensen Emil D
The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kgs. Lyngby, Denmark.
Biomia Aps Lersø Parkallé 44, Copenhagen, Denmark.
Nat Commun. 2025 May 3;16(1):4121. doi: 10.1038/s41467-025-59418-6.
Machine learning has revolutionized drug discovery by enabling the exploration of vast, uncharted chemical spaces essential for discovering novel patentable drugs. Despite the critical role of human G protein-coupled receptors in FDA-approved drugs, exhaustive in-distribution drug-target interaction testing across all pairs of human G protein-coupled receptors and known drugs is rare due to significant economic and technical challenges. This often leaves off-target effects unexplored, which poses a considerable risk to drug safety. In contrast to the traditional focus on out-of-distribution exploration (drug discovery), we introduce a neighborhood-to-prediction model termed Chemical Space Neural Networks that leverages network homophily and training-free graph neural networks with labels as features. We show that Chemical Space Neural Networks' ability to make accurate predictions strongly correlates with network homophily. Thus, labels as features strongly increase a machine learning model's capacity to enhance in-distribution prediction accuracy, which we show by integrating labeled data during inference. We validate these advancements in a high-throughput yeast biosensing system (3773 drug-target interactions, 539 compounds, 7 human G protein-coupled receptors) to discover novel drug-target interactions for FDA-approved drugs and to expand the general understanding of how to build reliable predictors to guide experimental verification.
机器学习通过能够探索对于发现新型可专利药物至关重要的广阔未知化学空间,彻底改变了药物发现的方式。尽管人类G蛋白偶联受体在FDA批准的药物中起着关键作用,但由于巨大的经济和技术挑战,对所有人类G蛋白偶联受体与已知药物对进行详尽的分布内药物-靶点相互作用测试却很少见。这往往使得脱靶效应未得到探索,从而给药物安全带来相当大的风险。与传统上对分布外探索(药物发现)的关注不同,我们引入了一种名为化学空间神经网络的邻域到预测模型,该模型利用网络同质性以及以标签为特征的无训练图神经网络。我们表明,化学空间神经网络进行准确预测的能力与网络同质性密切相关。因此,以标签为特征能极大提高机器学习模型增强分布内预测准确性的能力,我们通过在推理过程中整合标记数据来证明这一点。我们在高通量酵母生物传感系统(3773种药物-靶点相互作用、539种化合物、7种人类G蛋白偶联受体)中验证了这些进展,以发现FDA批准药物的新型药物-靶点相互作用,并扩展对如何构建可靠预测器以指导实验验证的总体认识。