Kilicoglu Halil, Demner-Fushman Dina, Rindflesch Thomas C, Wilczynski Nancy L, Haynes R Brian
Department of Computer Science and Software Engineering, Concordia University, 1515 Ste Catherine West, Montréal, QC, H3G 1M8, Canada.
J Am Med Inform Assoc. 2009 Jan-Feb;16(1):25-31. doi: 10.1197/jamia.M2996. Epub 2008 Oct 24.
The growing numbers of topically relevant biomedical publications readily available due to advances in document retrieval methods pose a challenge to clinicians practicing evidence-based medicine. It is increasingly time consuming to acquire and critically appraise the available evidence. This problem could be addressed in part if methods were available to automatically recognize rigorous studies immediately applicable in a specific clinical situation. We approach the problem of recognizing studies containing useable clinical advice from retrieved topically relevant articles as a binary classification problem. The gold standard used in the development of PubMed clinical query filters forms the basis of our approach. We identify scientifically rigorous studies using supervised machine learning techniques (Naïve Bayes, support vector machine (SVM), and boosting) trained on high-level semantic features. We combine these methods using an ensemble learning method (stacking). The performance of learning methods is evaluated using precision, recall and F(1) score, in addition to area under the receiver operating characteristic (ROC) curve (AUC). Using a training set of 10,000 manually annotated MEDLINE citations, and a test set of an additional 2,000 citations, we achieve 73.7% precision and 61.5% recall in identifying rigorous, clinically relevant studies, with stacking over five feature-classifier combinations and 82.5% precision and 84.3% recall in recognizing rigorous studies with treatment focus using stacking over word + metadata feature vector. Our results demonstrate that a high quality gold standard and advanced classification methods can help clinicians acquire best evidence from the medical literature.
由于文献检索方法的进步,与局部相关的生物医学出版物数量不断增加,这给从事循证医学的临床医生带来了挑战。获取并严格评估现有证据越来越耗时。如果能够提供自动识别可立即应用于特定临床情况的严谨研究的方法,这个问题就能得到部分解决。我们将从检索到的局部相关文章中识别包含可用临床建议的研究这一问题,视为一个二元分类问题。用于开发PubMed临床查询过滤器的金标准构成了我们方法的基础。我们使用基于高级语义特征训练的监督机器学习技术(朴素贝叶斯、支持向量机(SVM)和提升算法)来识别科学严谨的研究。我们使用集成学习方法(堆叠)将这些方法结合起来。除了接收器操作特征(ROC)曲线下面积(AUC)外,还使用精确率、召回率和F(1)分数来评估学习方法的性能。使用一个包含10000条人工标注的MEDLINE引文的训练集和另外2000条引文的测试集,我们在识别严谨的、临床相关的研究时,通过对五种特征分类器组合进行堆叠,精确率达到73.7%,召回率达到61.5%;在使用单词+元数据特征向量进行堆叠来识别以治疗为重点的严谨研究时,精确率达到82.5%,召回率达到84.3%。我们的结果表明,高质量的金标准和先进的分类方法可以帮助临床医生从医学文献中获取最佳证据。