提高用于筛选高质量文章的文本分类模型的性能。

Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles.

作者信息

Kim Seunghee, Choi Jinwook

机构信息

Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul, Korea.

出版信息

Healthc Inform Res. 2012 Mar;18(1):18-28. doi: 10.4258/hir.2012.18.1.18. Epub 2012 Mar 31.

DOI:10.4258/hir.2012.18.1.18

PMID:22509470

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3324751/

Abstract

OBJECTIVES

Machine learning systems can considerably reduce the time and effort needed by experts to perform new systematic reviews (SRs). This study investigates categorization models, which are trained on a combination of included and commonly excluded articles, which can improve performance by identifying high quality articles for new procedures or drug SRs.

METHODS

Test collections were built using the annotated reference files from 19 procedure and 15 drug systematic reviews. The classification models, using a support vector machine, were trained by the combined even data of other topics, excepting the desired topic. This approach was compared to the combination of included and commonly excluded articles with the combination of included and excluded articles. Accuracy was used for the measure of comparison.

RESULTS

On average, the performance was improved by about 15% in the procedure topics and 11% in the drug topics when the classification models trained on the combination of articles included and commonly excluded, were used. The system using the combination of included and commonly excluded articles performed better than the combination of included and excluded articles in all of the procedure topics.

CONCLUSIONS

Automatically rigorous article classification using machine learning can reduce the workload of experts when they perform systematic reviews when the topic-specific data are scarce. In particular, when the combination of included and commonly excluded articles is used, this system will be more effective.

摘要

目标

机器学习系统可以大幅减少专家进行新的系统评价（SR）所需的时间和精力。本研究调查分类模型，该模型在纳入文章和通常排除文章的组合上进行训练，通过为新程序或药物SR识别高质量文章来提高性能。

方法

使用来自19项程序和15项药物系统评价的注释参考文件构建测试集。分类模型使用支持向量机，通过除所需主题外的其他主题的组合偶数数据进行训练。将这种方法与纳入文章和通常排除文章的组合以及纳入文章和排除文章的组合进行比较。使用准确率作为比较指标。

结果

当使用在纳入文章和通常排除文章的组合上训练的分类模型时，在程序主题中平均性能提高约15%，在药物主题中提高11%。在所有程序主题中，使用纳入文章和通常排除文章组合的系统比纳入文章和排除文章组合的系统表现更好。

结论

当特定主题数据稀缺时，使用机器学习进行自动严格的文章分类可以减少专家进行系统评价时的工作量。特别是，当使用纳入文章和通常排除文章的组合时，该系统将更有效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1064/3324751/b30f401e7231/hir-18-18-g001.jpg

相似文献

Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles.提高用于筛选高质量文章的文本分类模型的性能。

Healthc Inform Res. 2012 Mar;18(1):18-28. doi: 10.4258/hir.2012.18.1.18. Epub 2012 Mar 31.

An SVM-based high-quality article classifier for systematic reviews.一种用于系统评价的基于支持向量机的高质量文章分类器。

J Biomed Inform. 2014 Feb;47:153-9. doi: 10.1016/j.jbi.2013.10.005. Epub 2013 Oct 29.

Cross-topic learning for work prioritization in systematic review creation and update.跨主题学习在系统综述创建和更新中的工作优先级排序。

J Am Med Inform Assoc. 2009 Sep-Oct;16(5):690-704. doi: 10.1197/jamia.M3162. Epub 2009 Jun 30.

Automatic classification of literature in systematic reviews on food safety using machine learning.利用机器学习对食品安全系统评价中的文献进行自动分类。

Curr Res Food Sci. 2021 Dec 26;5:84-95. doi: 10.1016/j.crfs.2021.12.010. eCollection 2022.

Reducing workload in systematic review preparation using automated citation classification.使用自动引文分类减少系统评价准备工作中的工作量。

J Am Med Inform Assoc. 2006 Mar-Apr;13(2):206-19. doi: 10.1197/jamia.M1929. Epub 2005 Dec 15.

The future of Cochrane Neonatal.考克兰新生儿协作网的未来。

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

Artificial Intelligence in Pharmacoepidemiology: A Systematic Review. Part 1-Overview of Knowledge Discovery Techniques in Artificial Intelligence.药物流行病学中的人工智能：系统评价。第1部分——人工智能中的知识发现技术概述。

Front Pharmacol. 2020 Jul 16;11:1028. doi: 10.3389/fphar.2020.01028. eCollection 2020.

[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].[容量与健康结果：来自系统评价和意大利医院数据评估的证据]

Epidemiol Prev. 2013 Mar-Jun;37(2-3 Suppl 2):1-100.

A new algorithm for reducing the workload of experts in performing systematic reviews.一种用于减少系统评价专家工作量的新算法。

J Am Med Inform Assoc. 2010 Jul-Aug;17(4):446-53. doi: 10.1136/jamia.2010.004325.

引用本文的文献

Machine Learning Methods for Systematic Reviews:: A Rapid Scoping Review.系统评价的机器学习方法：快速范围综述

Dela J Public Health. 2023 Nov 30;9(4):40-47. doi: 10.32481/djph.2023.11.008. eCollection 2023 Nov.

Aligning text mining and machine learning algorithms with best practices for study selection in systematic literature reviews.将文本挖掘和机器学习算法与系统文献综述中的研究选择最佳实践相结合。

Syst Rev. 2020 Dec 13;9(1):293. doi: 10.1186/s13643-020-01520-5.

SWIFT-Active Screener: Accelerated document screening through active learning and integrated recall estimation.SWIFT-Active Screener：通过主动学习和集成召回估计实现加速文档筛选。

Environ Int. 2020 May;138:105623. doi: 10.1016/j.envint.2020.105623. Epub 2020 Mar 20.

Examining the Distribution, Modularity, and Community Structure in Article Networks for Systematic Reviews.审视系统评价文章网络中的分布、模块化和社区结构。

AMIA Annu Symp Proc. 2015 Nov 5;2015:1927-36. eCollection 2015.

Using MEDLINE Elemental Similarity to Assist in the Article Screening Process for Systematic Reviews.使用 MEDLINE 元素相似度辅助系统评价文章筛选过程。

JMIR Med Inform. 2015 Aug 31;3(3):e28. doi: 10.2196/medinform.3982.

Using text mining for study identification in systematic reviews: a systematic review of current approaches.在系统评价中使用文本挖掘进行研究识别：当前方法的系统评价

Syst Rev. 2015 Jan 14;4(1):5. doi: 10.1186/2046-4053-4-5.

Clinical care improvement with use of health information technology focusing on evidence based medicine.利用以循证医学为重点的健康信息技术改善临床护理。

Healthc Inform Res. 2012 Sep;18(3):164-70. doi: 10.4258/hir.2012.18.3.164. Epub 2012 Sep 30.

本文引用的文献

A new algorithm for reducing the workload of experts in performing systematic reviews.一种用于减少系统评价专家工作量的新算法。

J Am Med Inform Assoc. 2010 Jul-Aug;17(4):446-53. doi: 10.1136/jamia.2010.004325.

Cross-topic learning for work prioritization in systematic review creation and update.跨主题学习在系统综述创建和更新中的工作优先级排序。

J Am Med Inform Assoc. 2009 Sep-Oct;16(5):690-704. doi: 10.1197/jamia.M3162. Epub 2009 Jun 30.

Avoidable waste in the production and reporting of research evidence.研究证据生产与报告中的可避免浪费。

Lancet. 2009 Jul 4;374(9683):86-9. doi: 10.1016/S0140-6736(09)60329-9. Epub 2009 Jun 12.

Optimizing feature representation for automated systematic review work prioritization.优化用于自动系统评价工作优先级排序的特征表示。

AMIA Annu Symp Proc. 2008 Nov 6;2008:121-5.

Towards automatic recognition of scientifically rigorous clinical research evidence.迈向科学严谨临床研究证据的自动识别。

J Am Med Inform Assoc. 2009 Jan-Feb;16(1):25-31. doi: 10.1197/jamia.M2996. Epub 2008 Oct 24.

Reducing workload in systematic review preparation using automated citation classification.使用自动引文分类减少系统评价准备工作中的工作量。

J Am Med Inform Assoc. 2006 Mar-Apr;13(2):206-19. doi: 10.1197/jamia.M1929. Epub 2005 Dec 15.

Text categorization models for high-quality article retrieval in internal medicine.用于内科高质量文章检索的文本分类模型。

J Am Med Inform Assoc. 2005 Mar-Apr;12(2):207-16. doi: 10.1197/jamia.M1641. Epub 2004 Nov 23.

Evidence based medicine: what it is and what it isn't.循证医学：它是什么以及不是什么。

BMJ. 1996 Jan 13;312(7023):71-2. doi: 10.1136/bmj.312.7023.71.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

提高用于筛选高质量文章的文本分类模型的性能。

Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles.

作者信息

机构信息

出版信息

OBJECTIVES

METHODS

RESULTS

CONCLUSIONS

目标

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献