Suppr超能文献

共现医学主题词网络中的链接预测:迈向基于文献的发现

Link Prediction on a Network of Co-occurring MeSH Terms: Towards Literature-based Discovery.

作者信息

Kastrin Andrej, Rindflesch Thomas C, Hristovski Dimitar

机构信息

Andrej Kastrin, PhD, Faculty of Information Studies, Ljubljanska cesta 31A, SI-8000 Novo Mesto, Slovenia, E-mail:

出版信息

Methods Inf Med. 2016 Aug 5;55(4):340-6. doi: 10.3414/ME15-01-0108. Epub 2016 Jul 20.

Abstract

OBJECTIVES

Literature-based discovery (LBD) is a text mining methodology for automatically generating research hypotheses from existing knowledge. We mimic the process of LBD as a classification problem on a graph of MeSH terms. We employ unsupervised and supervised link prediction methods for predicting previously unknown connections between biomedical concepts.

METHODS

We evaluate the effectiveness of link prediction through a series of experiments using a MeSH network that contains the history of link formation between biomedical concepts. We performed link prediction using proximity measures, such as common neighbor (CN), Jaccard coefficient (JC), Adamic / Adar index (AA) and preferential attachment (PA). Our approach relies on the assumption that similar nodes are more likely to establish a link in the future.

RESULTS

Applying an unsupervised approach, the AA measure achieved the best performance in terms of area under the ROC curve (AUC = 0.76), followed by CN, JC, and PA. In a supervised approach, we evaluate whether proximity measures can be combined to define a model of link formation across all four predictors. We applied various classifiers, including decision trees, k-nearest neighbors, logistic regression, multilayer perceptron, naïve Bayes, and random forests. Random forest classifier accomplishes the best performance (AUC = 0.87).

CONCLUSIONS

The link prediction approach proved to be effective for LBD processing. Supervised statistical learning approaches clearly outperform an unsupervised approach to link prediction.

摘要

目标

基于文献的发现(LBD)是一种文本挖掘方法,用于从现有知识中自动生成研究假设。我们将LBD过程模拟为医学主题词(MeSH)术语图上的分类问题。我们采用无监督和有监督的链接预测方法来预测生物医学概念之间先前未知的联系。

方法

我们通过一系列实验评估链接预测的有效性,这些实验使用了一个包含生物医学概念之间链接形成历史的MeSH网络。我们使用了诸如共同邻居(CN)、杰卡德系数(JC)、亚当ic/阿达指数(AA)和优先连接(PA)等接近度度量进行链接预测。我们的方法基于这样的假设,即相似的节点在未来更有可能建立链接。

结果

应用无监督方法时,AA度量在ROC曲线下面积(AUC = 0.76)方面表现最佳,其次是CN、JC和PA。在有监督方法中,我们评估接近度度量是否可以组合起来定义一个跨越所有四个预测器的链接形成模型。我们应用了各种分类器,包括决策树、k近邻、逻辑回归、多层感知器、朴素贝叶斯和随机森林。随机森林分类器表现最佳(AUC = 0.87)。

结论

链接预测方法被证明对LBD处理有效。有监督的统计学习方法在链接预测方面明显优于无监督方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验