Suppr超能文献

利用知识图谱中的谓词和出处信息进行药物疗效筛选。

Using predicate and provenance information from a knowledge graph for drug efficacy screening.

作者信息

Vlietstra Wytze J, Vos Rein, Sijbers Anneke M, van Mulligen Erik M, Kors Jan A

机构信息

Department of Medical Informatics, Erasmus University Medical Centre, Rotterdam, 3015, GE, the Netherlands.

Department of Methodology and Statistics, Maastricht University, Maastricht, 6200, MD, the Netherlands.

出版信息

J Biomed Semantics. 2018 Sep 6;9(1):23. doi: 10.1186/s13326-018-0189-6.

Abstract

BACKGROUND

Biomedical knowledge graphs have become important tools to computationally analyse the comprehensive body of biomedical knowledge. They represent knowledge as subject-predicate-object triples, in which the predicate indicates the relationship between subject and object. A triple can also contain provenance information, which consists of references to the sources of the triple (e.g. scientific publications or database entries). Knowledge graphs have been used to classify drug-disease pairs for drug efficacy screening, but existing computational methods have often ignored predicate and provenance information. Using this information, we aimed to develop a supervised machine learning classifier and determine the added value of predicate and provenance information for drug efficacy screening. To ensure the biological plausibility of our method we performed our research on the protein level, where drugs are represented by their drug target proteins, and diseases by their disease proteins.

RESULTS

Using random forests with repeated 10-fold cross-validation, our method achieved an area under the ROC curve (AUC) of 78.1% and 74.3% for two reference sets. We benchmarked against a state-of-the-art knowledge-graph technique that does not use predicate and provenance information, obtaining AUCs of 65.6% and 64.6%, respectively. Classifiers that only used predicate information performed superior to classifiers that only used provenance information, but using both performed best.

CONCLUSION

We conclude that both predicate and provenance information provide added value for drug efficacy screening.

摘要

背景

生物医学知识图谱已成为对生物医学知识整体进行计算分析的重要工具。它们将知识表示为主谓宾三元组,其中谓词表示主语和宾语之间的关系。一个三元组还可以包含来源信息,该信息由对三元组来源(例如科学出版物或数据库条目)的引用组成。知识图谱已被用于对药物-疾病对进行分类以进行药物疗效筛选,但现有的计算方法通常忽略了谓词和来源信息。利用这些信息,我们旨在开发一种监督式机器学习分类器,并确定谓词和来源信息在药物疗效筛选中的附加价值。为确保我们方法的生物学合理性,我们在蛋白质水平上进行了研究,其中药物由其药物靶蛋白表示,疾病由其疾病蛋白表示。

结果

使用具有重复10折交叉验证的随机森林,我们的方法在两个参考集上的ROC曲线下面积(AUC)分别达到78.1%和74.3%。我们与一种不使用谓词和来源信息的先进知识图谱技术进行了基准测试,其AUC分别为65.6%和64.6%。仅使用谓词信息的分类器比仅使用来源信息的分类器表现更好,但同时使用两者表现最佳。

结论

我们得出结论,谓词和来源信息在药物疗效筛选中均提供了附加价值。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验