Gurbuz Ozge, Alanis-Lobato Gregorio, Picart-Armada Sergio, Sun Miao, Haslinger Christian, Lawless Nathan, Fernandez-Albert Francesc
Discovery Research Coordination Germany, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riss, Germany.
Global Computational Biology and Data Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riss, Germany.
Front Genet. 2022 Mar 14;13:814093. doi: 10.3389/fgene.2022.814093. eCollection 2022.
Indication expansion aims to find new indications for existing targets in order to accelerate the process of launching a new drug for a disease on the market. The rapid increase in data types and data sources for computational drug discovery has fostered the use of semantic knowledge graphs (KGs) for indication expansion through target centric approaches, or in other words, target repositioning. Previously, we developed a novel method to construct a KG for indication expansion studies, with the aim of finding and justifying alternative indications for a target gene of interest. In contrast to other KGs, ours combines human-curated full-text literature and gene expression data from biomedical databases to encode relationships between genes, diseases, and tissues. Here, we assessed the suitability of our KG for explainable target-disease link prediction using a glass-box approach. To evaluate the predictive power of our KG, we applied shortest path with tissue information- and embedding-based prediction methods to a graph constructed with information published before or during 2010. We also obtained random baselines by applying the shortest path predictive methods to KGs with randomly shuffled node labels. Then, we evaluated the accuracy of the top predictions using gene-disease links reported after 2010. In addition, we investigated the contribution of the KG's tissue expression entity to the prediction performance. Our experiments showed that shortest path-based methods significantly outperform the random baselines and embedding-based methods outperform the shortest path predictions. Importantly, removing the tissue expression entity from the KG severely impacts the quality of the predictions, especially those produced by the embedding approaches. Finally, since the interpretability of the predictions is crucial in indication expansion, we highlight the advantages of our glass-box model through the examination of example candidate target-disease predictions.
适应症拓展旨在为现有靶点寻找新的适应症,以加速新药针对某种疾病上市的进程。计算药物发现中数据类型和数据源的迅速增加,推动了语义知识图谱(KGs)通过以靶点为中心的方法用于适应症拓展,或者换句话说,用于靶点重新定位。此前,我们开发了一种新颖的方法来构建用于适应症拓展研究的KG,目的是找到并论证感兴趣的靶基因的替代适应症。与其他KGs不同,我们的KG结合了人工整理的全文文献和来自生物医学数据库的基因表达数据,以编码基因、疾病和组织之间的关系。在此,我们使用一种透明盒方法评估了我们的KG对于可解释的靶点-疾病关联预测的适用性。为了评估我们的KG的预测能力,我们将带有组织信息的最短路径和基于嵌入的预测方法应用于一个由2010年之前或期间发表的信息构建的图谱。我们还通过将最短路径预测方法应用于节点标签随机打乱的KGs获得了随机基线。然后,我们使用2010年之后报道的基因-疾病关联评估了顶级预测的准确性。此外,我们研究了KG的组织表达实体对预测性能的贡献。我们的实验表明,基于最短路径的方法显著优于随机基线,基于嵌入的方法优于最短路径预测。重要的是,从KG中移除组织表达实体严重影响预测质量,尤其是那些由嵌入方法产生的预测。最后,由于预测的可解释性在适应症拓展中至关重要,我们通过检查示例候选靶点-疾病预测突出了我们的透明盒模型的优势。