Suppr超能文献

提高用于筛选高质量文章的文本分类模型的性能。

Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles.

作者信息

Kim Seunghee, Choi Jinwook

机构信息

Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul, Korea.

出版信息

Healthc Inform Res. 2012 Mar;18(1):18-28. doi: 10.4258/hir.2012.18.1.18. Epub 2012 Mar 31.

Abstract

OBJECTIVES

Machine learning systems can considerably reduce the time and effort needed by experts to perform new systematic reviews (SRs). This study investigates categorization models, which are trained on a combination of included and commonly excluded articles, which can improve performance by identifying high quality articles for new procedures or drug SRs.

METHODS

Test collections were built using the annotated reference files from 19 procedure and 15 drug systematic reviews. The classification models, using a support vector machine, were trained by the combined even data of other topics, excepting the desired topic. This approach was compared to the combination of included and commonly excluded articles with the combination of included and excluded articles. Accuracy was used for the measure of comparison.

RESULTS

On average, the performance was improved by about 15% in the procedure topics and 11% in the drug topics when the classification models trained on the combination of articles included and commonly excluded, were used. The system using the combination of included and commonly excluded articles performed better than the combination of included and excluded articles in all of the procedure topics.

CONCLUSIONS

Automatically rigorous article classification using machine learning can reduce the workload of experts when they perform systematic reviews when the topic-specific data are scarce. In particular, when the combination of included and commonly excluded articles is used, this system will be more effective.

摘要

目标

机器学习系统可以大幅减少专家进行新的系统评价(SR)所需的时间和精力。本研究调查分类模型,该模型在纳入文章和通常排除文章的组合上进行训练,通过为新程序或药物SR识别高质量文章来提高性能。

方法

使用来自19项程序和15项药物系统评价的注释参考文件构建测试集。分类模型使用支持向量机,通过除所需主题外的其他主题的组合偶数数据进行训练。将这种方法与纳入文章和通常排除文章的组合以及纳入文章和排除文章的组合进行比较。使用准确率作为比较指标。

结果

当使用在纳入文章和通常排除文章的组合上训练的分类模型时,在程序主题中平均性能提高约15%,在药物主题中提高11%。在所有程序主题中,使用纳入文章和通常排除文章组合的系统比纳入文章和排除文章组合的系统表现更好。

结论

当特定主题数据稀缺时,使用机器学习进行自动严格的文章分类可以减少专家进行系统评价时的工作量。特别是,当使用纳入文章和通常排除文章的组合时,该系统将更有效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1064/3324751/b30f401e7231/hir-18-18-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验