• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用自动化文本分类区分实证研究和非实证作品。

Discriminating between empirical studies and nonempirical works using automated text classification.

机构信息

Département d'informatique et de recherche opérationnelle, Université de Montréal, Montréal, Canada.

EPPI-Centre, University College London Institute of Education, London, UK.

出版信息

Res Synth Methods. 2018 Dec;9(4):587-601. doi: 10.1002/jrsm.1317. Epub 2018 Aug 29.

DOI:10.1002/jrsm.1317
PMID:30103261
Abstract

OBJECTIVE

Identify the most performant automated text classification method (eg, algorithm) for differentiating empirical studies from nonempirical works in order to facilitate systematic mixed studies reviews.

METHODS

The algorithms were trained and validated with 8050 database records, which had previously been manually categorized as empirical or nonempirical. A Boolean mixed filter developed for filtering MEDLINE records (title, abstract, keywords, and full texts) was used as a baseline. The set of features (eg, characteristics from the data) included observable terms and concepts extracted from a metathesaurus. The efficiency of the approaches was measured using sensitivity, precision, specificity, and accuracy.

RESULTS

The decision trees algorithm demonstrated the highest performance, surpassing the accuracy of the Boolean mixed filter by 30%. The use of full texts did not result in significant gains compared with title, abstract, keywords, and records. Results also showed that mixing concepts with observable terms can improve the classification.

SIGNIFICANCE

Screening of records, identified in bibliographic databases, for relevant studies to include in systematic reviews can be accelerated with automated text classification.

摘要

目的

确定区分经验研究和非经验性文献的最有效自动化文本分类方法(例如算法),以便于系统的混合研究综述。

方法

使用 8050 个已预先手动分类为经验性或非经验性的数据库记录来训练和验证算法。用于过滤 MEDLINE 记录(标题、摘要、关键词和全文)的布尔混合过滤器被用作基线。特征集(例如,从元数据中提取的特征和概念)包括从词库中提取的可观察术语和概念。使用敏感性、精度、特异性和准确性来衡量方法的效率。

结果

决策树算法表现出最高的性能,其准确性比布尔混合过滤器高出 30%。与标题、摘要、关键词和记录相比,使用全文并没有带来显著的收益。结果还表明,将概念与可观察术语混合可以提高分类效果。

意义

通过自动化文本分类,可以加速对文献数据库中记录的筛选,以确定纳入系统综述的相关研究。

相似文献

1
Discriminating between empirical studies and nonempirical works using automated text classification.使用自动化文本分类区分实证研究和非实证作品。
Res Synth Methods. 2018 Dec;9(4):587-601. doi: 10.1002/jrsm.1317. Epub 2018 Aug 29.
2
Search strategies to identify diagnostic accuracy studies in MEDLINE and EMBASE.在MEDLINE和EMBASE中识别诊断准确性研究的检索策略。
Cochrane Database Syst Rev. 2013 Sep 11;2013(9):MR000022. doi: 10.1002/14651858.MR000022.pub3.
3
Search strategies to identify observational studies in MEDLINE and Embase.在MEDLINE和Embase中识别观察性研究的检索策略。
Cochrane Database Syst Rev. 2019 Mar 12;3(3):MR000041. doi: 10.1002/14651858.MR000041.pub2.
4
Searching Embase and MEDLINE by using only major descriptors or title and abstract fields: a prospective exploratory study.仅使用主要主题词或标题和摘要字段检索 Embase 和 MEDLINE:一项前瞻性探索性研究。
Syst Rev. 2018 Nov 20;7(1):200. doi: 10.1186/s13643-018-0864-9.
5
Aligning text mining and machine learning algorithms with best practices for study selection in systematic literature reviews.将文本挖掘和机器学习算法与系统文献综述中的研究选择最佳实践相结合。
Syst Rev. 2020 Dec 13;9(1):293. doi: 10.1186/s13643-020-01520-5.
6
Screening for in vitro systematic reviews: a comparison of screening methods and training of a machine learning classifier.体外系统评价筛查:筛查方法比较及机器学习分类器的训练。
Clin Sci (Lond). 2023 Jan 31;137(2):181-193. doi: 10.1042/CS20220594.
7
Machine learning for identifying Randomized Controlled Trials: An evaluation and practitioner's guide.机器学习在识别随机对照试验中的应用:评估与实践指南。
Res Synth Methods. 2018 Dec;9(4):602-614. doi: 10.1002/jrsm.1287. Epub 2018 Feb 7.
8
A comparison of metrics and performance characteristics of different search strategies for article retrieval for a systematic review of the global epidemiology of kidney and urinary diseases.比较不同检索策略的指标和性能特征,以检索系统评价全球肾脏病和泌尿系统疾病流行病学的文章。
BMC Med Res Methodol. 2018 Oct 19;18(1):110. doi: 10.1186/s12874-018-0569-8.
9
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
10
Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews.机器学习减少了工作量,同时最小化了漏检研究的风险:一项用于 Cochrane 综述的随机对照试验分类器的开发和评估。
J Clin Epidemiol. 2021 May;133:140-151. doi: 10.1016/j.jclinepi.2020.11.003. Epub 2020 Nov 7.

引用本文的文献

1
Automation of systematic reviews of biomedical literature: a scoping review of studies indexed in PubMed.生物医学文献系统评价自动化:PubMed 索引研究的范围综述。
Syst Rev. 2024 Jul 8;13(1):174. doi: 10.1186/s13643-024-02592-3.
2
Machine Learning Methods for Systematic Reviews:: A Rapid Scoping Review.系统评价的机器学习方法:快速范围综述
Dela J Public Health. 2023 Nov 30;9(4):40-47. doi: 10.32481/djph.2023.11.008. eCollection 2023 Nov.
3
Identifying social science engagement within agroecology: Classifying transdisciplinary literature with a semi-automated textual classification method.
识别农业生态学中的社会科学参与:用半自动化文本分类方法对跨学科文献进行分类。
PLoS One. 2023 Feb 3;18(2):e0278991. doi: 10.1371/journal.pone.0278991. eCollection 2023.