Kim Seunghee, Choi Jinwook
Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul, Korea.
Healthc Inform Res. 2012 Mar;18(1):18-28. doi: 10.4258/hir.2012.18.1.18. Epub 2012 Mar 31.
Machine learning systems can considerably reduce the time and effort needed by experts to perform new systematic reviews (SRs). This study investigates categorization models, which are trained on a combination of included and commonly excluded articles, which can improve performance by identifying high quality articles for new procedures or drug SRs.
Test collections were built using the annotated reference files from 19 procedure and 15 drug systematic reviews. The classification models, using a support vector machine, were trained by the combined even data of other topics, excepting the desired topic. This approach was compared to the combination of included and commonly excluded articles with the combination of included and excluded articles. Accuracy was used for the measure of comparison.
On average, the performance was improved by about 15% in the procedure topics and 11% in the drug topics when the classification models trained on the combination of articles included and commonly excluded, were used. The system using the combination of included and commonly excluded articles performed better than the combination of included and excluded articles in all of the procedure topics.
Automatically rigorous article classification using machine learning can reduce the workload of experts when they perform systematic reviews when the topic-specific data are scarce. In particular, when the combination of included and commonly excluded articles is used, this system will be more effective.
机器学习系统可以大幅减少专家进行新的系统评价(SR)所需的时间和精力。本研究调查分类模型,该模型在纳入文章和通常排除文章的组合上进行训练,通过为新程序或药物SR识别高质量文章来提高性能。
使用来自19项程序和15项药物系统评价的注释参考文件构建测试集。分类模型使用支持向量机,通过除所需主题外的其他主题的组合偶数数据进行训练。将这种方法与纳入文章和通常排除文章的组合以及纳入文章和排除文章的组合进行比较。使用准确率作为比较指标。
当使用在纳入文章和通常排除文章的组合上训练的分类模型时,在程序主题中平均性能提高约15%,在药物主题中提高11%。在所有程序主题中,使用纳入文章和通常排除文章组合的系统比纳入文章和排除文章组合的系统表现更好。
当特定主题数据稀缺时,使用机器学习进行自动严格的文章分类可以减少专家进行系统评价时的工作量。特别是,当使用纳入文章和通常排除文章的组合时,该系统将更有效。