Bian Jiantao, Morid Mohammad Amin, Jonnalagadda Siddhartha, Luo Gang, Del Fiol Guilherme
Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA.
Department of Operations and Information Systems, David Eccles School of Business, University of Utah, Salt Lake City, UT, USA.
J Biomed Inform. 2017 Sep;73:95-103. doi: 10.1016/j.jbi.2017.07.015. Epub 2017 Jul 26.
The practice of evidence-based medicine involves integrating the latest best available evidence into patient care decisions. Yet, critical barriers exist for clinicians' retrieval of evidence that is relevant for a particular patient from primary sources such as randomized controlled trials and meta-analyses. To help address those barriers, we investigated machine learning algorithms that find clinical studies with high clinical impact from PubMed®.
Our machine learning algorithms use a variety of features including bibliometric features (e.g., citation count), social media attention, journal impact factors, and citation metadata. The algorithms were developed and evaluated with a gold standard composed of 502 high impact clinical studies that are referenced in 11 clinical evidence-based guidelines on the treatment of various diseases. We tested the following hypotheses: (1) our high impact classifier outperforms a state-of-the-art classifier based on citation metadata and citation terms, and PubMed's® relevance sort algorithm; and (2) the performance of our high impact classifier does not decrease significantly after removing proprietary features such as citation count.
The mean top 20 precision of our high impact classifier was 34% versus 11% for the state-of-the-art classifier and 4% for PubMed's® relevance sort (p=0.009); and the performance of our high impact classifier did not decrease significantly after removing proprietary features (mean top 20 precision=34% vs. 36%; p=0.085).
The high impact classifier, using features such as bibliometrics, social media attention and MEDLINE® metadata, outperformed previous approaches and is a promising alternative to identifying high impact studies for clinical decision support.
循证医学实践涉及将最新的最佳可得证据整合到患者护理决策中。然而,临床医生从诸如随机对照试验和荟萃分析等主要来源检索与特定患者相关的证据时存在重大障碍。为帮助克服这些障碍,我们研究了能从PubMed®中找出具有高临床影响力临床研究的机器学习算法。
我们的机器学习算法使用多种特征,包括文献计量特征(如被引频次)、社交媒体关注度、期刊影响因子和被引元数据。这些算法是基于由502项高影响力临床研究组成的金标准开发和评估的,这些研究在11项关于各种疾病治疗的临床循证指南中被引用。我们检验了以下假设:(1)我们的高影响力分类器优于基于被引元数据和被引术语的最先进分类器以及PubMed®的相关性排序算法;(2)去除诸如被引频次等专有特征后,我们的高影响力分类器的性能不会显著下降。
我们的高影响力分类器前20名的平均精准率为34%,而最先进分类器为11%,PubMed®相关性排序为4%(p = 0.009);去除专有特征后,我们的高影响力分类器的性能没有显著下降(前20名平均精准率 = 34%对36%;p = 0.085)。
使用文献计量学、社交媒体关注度和MEDLINE®元数据等特征的高影响力分类器优于先前的方法,是识别用于临床决策支持的高影响力研究的一种有前景的替代方法。