Suppr超能文献

随机对照试验文章的自动置信度分级分类:循证医学的辅助手段

Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine.

作者信息

Cohen Aaron M, Smalheiser Neil R, McDonagh Marian S, Yu Clement, Adams Clive E, Davis John M, Yu Philip S

机构信息

Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239 USA

Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 60612 USA.

出版信息

J Am Med Inform Assoc. 2015 May;22(3):707-17. doi: 10.1093/jamia/ocu025. Epub 2015 Feb 5.

Abstract

OBJECTIVE

For many literature review tasks, including systematic review (SR) and other aspects of evidence-based medicine, it is important to know whether an article describes a randomized controlled trial (RCT). Current manual annotation is not complete or flexible enough for the SR process. In this work, highly accurate machine learning predictive models were built that include confidence predictions of whether an article is an RCT.

MATERIALS AND METHODS

The LibSVM classifier was used with forward selection of potential feature sets on a large human-related subset of MEDLINE to create a classification model requiring only the citation, abstract, and MeSH terms for each article.

RESULTS

The model achieved an area under the receiver operating characteristic curve of 0.973 and mean squared error of 0.013 on the held out year 2011 data. Accurate confidence estimates were confirmed on a manually reviewed set of test articles. A second model not requiring MeSH terms was also created, and performs almost as well.

DISCUSSION

Both models accurately rank and predict article RCT confidence. Using the model and the manually reviewed samples, it is estimated that about 8000 (3%) additional RCTs can be identified in MEDLINE, and that 5% of articles tagged as RCTs in Medline may not be identified.

CONCLUSION

Retagging human-related studies with a continuously valued RCT confidence is potentially more useful for article ranking and review than a simple yes/no prediction. The automated RCT tagging tool should offer significant savings of time and effort during the process of writing SRs, and is a key component of a multistep text mining pipeline that we are building to streamline SR workflow. In addition, the model may be useful for identifying errors in MEDLINE publication types. The RCT confidence predictions described here have been made available to users as a web service with a user query form front end at: http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/RCT_Tagger.cgi.

摘要

目的

对于许多文献综述任务,包括系统综述(SR)以及循证医学的其他方面,了解一篇文章是否描述了随机对照试验(RCT)很重要。当前的手动标注对于SR过程而言不够完整或灵活。在这项研究中,构建了高度准确的机器学习预测模型,该模型包括关于一篇文章是否为RCT的置信度预测。

材料与方法

使用LibSVM分类器,并在MEDLINE中一个与人类相关的大型子集中对潜在特征集进行前向选择,以创建一个仅需每篇文章的引用、摘要和医学主题词(MeSH)的分类模型。

结果

该模型在2011年留出的数据上实现了受试者操作特征曲线下面积为0.973,均方误差为0.013。在一组经人工审核的测试文章上证实了准确的置信度估计。还创建了一个不需要MeSH词的第二个模型,其表现几乎同样出色。

讨论

两个模型都能准确地对文章的RCT置信度进行排名和预测。使用该模型和人工审核的样本估计,在MEDLINE中可额外识别出约8000篇(3%)RCT,并且Medline中标记为RCT的文章可能有5%未被识别。

结论

用连续值的RCT置信度对与人类相关的研究进行重新标注,对于文章排名和综述而言可能比简单的是/否预测更有用。自动化的RCT标注工具在撰写SR的过程中应能显著节省时间和精力,并且是我们正在构建的用于简化SR工作流程的多步骤文本挖掘管道的关键组成部分。此外,该模型可能有助于识别MEDLINE出版物类型中的错误。这里描述的RCT置信度预测已作为一项网络服务提供给用户,其前端有用户查询表单,网址为:http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/RCT_Tagger.cgi。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff4a/4457112/6181a8888b39/ocu025f1p.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验