Suppr超能文献

一种用于研究文章分类的集成方法:以人工智能为例的研究

An ensemble approach for research article classification: a case study in artificial intelligence.

作者信息

Lu Min, Tang Lie, Zhou Xianke

机构信息

Hangzhou Science and Technology Information Institute, Hangzhou, Zhejiang, China.

Institute of Computer Innovation, Zhejiang University, Hangzhou, Zhejiang, China.

出版信息

PeerJ Comput Sci. 2024 Dec 10;10:e2521. doi: 10.7717/peerj-cs.2521. eCollection 2024.

Abstract

Text classification of research articles in emerging fields poses significant challenges due to their complex boundaries, interdisciplinary nature, and rapid evolution. Traditional methods, which rely on manually curated search terms and keyword matching, often lack recall due to the inherent incompleteness of keyword lists. In response to this limitation, this study introduces a deep learning-based ensemble approach that addresses the challenges of article classification in dynamic research areas, using the field of artificial intelligence (AI) as a case study. Our approach included using decision tree, sciBERT and regular expression matching on different fields of the articles, and a support vector machine (SVM) to merge the results from different models. We evaluated the effectiveness of our method on a manually labeled dataset, finding that our combined approach captured around 97% of AI-related articles in the web of science (WoS) with a precision of 0.92. This presents a 0.15 increase in F1-score compared with existing search term based approach. Following this, we performed an ablation study to prove that each component in the ensemble model contributes to the overall performance, and that sciBERT outperforms other pre-trained BERT models in this case.

摘要

新兴领域研究文章的文本分类面临重大挑战,因为其边界复杂、具有跨学科性质且发展迅速。传统方法依赖人工策划的搜索词和关键词匹配,由于关键词列表固有的不完整性,往往召回率较低。针对这一局限性,本研究引入了一种基于深度学习的集成方法,以人工智能(AI)领域为例,应对动态研究领域中文章分类的挑战。我们的方法包括在文章的不同字段上使用决策树、sciBERT和正则表达式匹配,并使用支持向量机(SVM)合并不同模型的结果。我们在一个人工标注的数据集上评估了我们方法的有效性,发现我们的组合方法在科学网(WoS)中捕获了约97%的与AI相关的文章,精确率为0.92。与现有的基于搜索词的方法相比,F1分数提高了0.15。在此之后,我们进行了一项消融研究,以证明集成模型中的每个组件都对整体性能有贡献,并且在这种情况下sciBERT优于其他预训练的BERT模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8213/11784741/b43cbc776e8c/peerj-cs-10-2521-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验