The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Nat Commun. 2024 Oct 15;15(1):8891. doi: 10.1038/s41467-024-53333-y.
Identifying genetic drivers of chronic diseases is necessary for drug discovery. Here, we develop a machine learning-assisted genetic priority score, which we call ML-GPS, that incorporates genetic associations with predicted disease phenotypes to enhance target discovery. First, we construct gradient boosting models to predict 112 chronic disease phecodes in the UK Biobank and analyze associations of predicted and observed phenotypes with common, rare, and ultra-rare variants to model the allelic series. We integrate these associations with existing evidence using gradient boosting with continuous feature encoding to construct ML-GPS, training it to predict drug indications in Open Targets and externally testing it in SIDER. We then generate ML-GPS predictions for 2,362,636 gene-phecode pairs. We find that the use of predicted phenotypes, which identify substantially more genetic associations than observed phenotypes across the allele frequency spectrum, significantly improves the performance of ML-GPS. ML-GPS increases coverage of drug targets, with the top 1% of all scores providing support for 15,077 gene-phecode pairs that previously had no support. ML-GPS can also identify well-known target-disease relationships, promising targets without indicated drugs, and targets for several drugs in clinical trials, including LRRK2 inhibitors for Parkinson's disease and olpasiran for cardiovascular disease.
确定慢性疾病的遗传驱动因素对于药物发现至关重要。在这里,我们开发了一种机器学习辅助的遗传优先级评分,我们称之为 ML-GPS,它结合了与预测疾病表型的遗传关联,以增强靶点发现。首先,我们构建梯度提升模型来预测英国生物库中的 112 种慢性疾病 phecode,并分析预测和观察到的表型与常见、罕见和超罕见变体的关联,以模拟等位基因系列。我们使用具有连续特征编码的梯度提升将这些关联与现有证据相结合,构建 ML-GPS,对 OpenTargets 中的药物适应症进行训练,并在 SIDER 中进行外部测试。然后,我们为 2,362,636 个基因-phecode 对生成 ML-GPS 预测。我们发现,使用预测表型可以在整个等位基因频率谱中识别出比观察表型多得多的遗传关联,这显著提高了 ML-GPS 的性能。ML-GPS 增加了药物靶点的覆盖率,前 1%的所有分数都为 15,077 个基因-phecode 对提供了支持,这些基因-phecode 对以前没有得到支持。ML-GPS 还可以识别出已知的靶-疾病关系、有前景的无药物靶点以及临床试验中几种药物的靶点,包括用于帕金森病的 LRRK2 抑制剂和用于心血管疾病的 olpasiran。