Suppr超能文献

使用自动化文本分类区分实证研究和非实证作品。

Discriminating between empirical studies and nonempirical works using automated text classification.

机构信息

Département d'informatique et de recherche opérationnelle, Université de Montréal, Montréal, Canada.

EPPI-Centre, University College London Institute of Education, London, UK.

出版信息

Res Synth Methods. 2018 Dec;9(4):587-601. doi: 10.1002/jrsm.1317. Epub 2018 Aug 29.

Abstract

OBJECTIVE

Identify the most performant automated text classification method (eg, algorithm) for differentiating empirical studies from nonempirical works in order to facilitate systematic mixed studies reviews.

METHODS

The algorithms were trained and validated with 8050 database records, which had previously been manually categorized as empirical or nonempirical. A Boolean mixed filter developed for filtering MEDLINE records (title, abstract, keywords, and full texts) was used as a baseline. The set of features (eg, characteristics from the data) included observable terms and concepts extracted from a metathesaurus. The efficiency of the approaches was measured using sensitivity, precision, specificity, and accuracy.

RESULTS

The decision trees algorithm demonstrated the highest performance, surpassing the accuracy of the Boolean mixed filter by 30%. The use of full texts did not result in significant gains compared with title, abstract, keywords, and records. Results also showed that mixing concepts with observable terms can improve the classification.

SIGNIFICANCE

Screening of records, identified in bibliographic databases, for relevant studies to include in systematic reviews can be accelerated with automated text classification.

摘要

目的

确定区分经验研究和非经验性文献的最有效自动化文本分类方法(例如算法),以便于系统的混合研究综述。

方法

使用 8050 个已预先手动分类为经验性或非经验性的数据库记录来训练和验证算法。用于过滤 MEDLINE 记录(标题、摘要、关键词和全文)的布尔混合过滤器被用作基线。特征集(例如,从元数据中提取的特征和概念)包括从词库中提取的可观察术语和概念。使用敏感性、精度、特异性和准确性来衡量方法的效率。

结果

决策树算法表现出最高的性能,其准确性比布尔混合过滤器高出 30%。与标题、摘要、关键词和记录相比,使用全文并没有带来显著的收益。结果还表明,将概念与可观察术语混合可以提高分类效果。

意义

通过自动化文本分类,可以加速对文献数据库中记录的筛选,以确定纳入系统综述的相关研究。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验