Suppr超能文献

一种用于临床试验招募资格标准文本分类的集成学习策略:算法开发与验证

An Ensemble Learning Strategy for Eligibility Criteria Text Classification for Clinical Trial Recruitment: Algorithm Development and Validation.

作者信息

Zeng Kun, Pan Zhiwei, Xu Yibin, Qu Yingying

机构信息

School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China.

School of Computer Science, South China Normal University, Guangzhou, China.

出版信息

JMIR Med Inform. 2020 Jul 1;8(7):e17832. doi: 10.2196/17832.

Abstract

BACKGROUND

Eligibility criteria are the main strategy for screening appropriate participants for clinical trials. Automatic analysis of clinical trial eligibility criteria by digital screening, leveraging natural language processing techniques, can improve recruitment efficiency and reduce the costs involved in promoting clinical research.

OBJECTIVE

We aimed to create a natural language processing model to automatically classify clinical trial eligibility criteria.

METHODS

We proposed a classifier for short text eligibility criteria based on ensemble learning, where a set of pretrained models was integrated. The pretrained models included state-of-the-art deep learning methods for training and classification, including Bidirectional Encoder Representations from Transformers (BERT), XLNet, and A Robustly Optimized BERT Pretraining Approach (RoBERTa). The classification results by the integrated models were combined as new features for training a Light Gradient Boosting Machine (LightGBM) model for eligibility criteria classification.

RESULTS

Our proposed method obtained an accuracy of 0.846, a precision of 0.803, and a recall of 0.817 on a standard data set from a shared task of an international conference. The macro F1 value was 0.807, outperforming the state-of-the-art baseline methods on the shared task.

CONCLUSIONS

We designed a model for screening short text classification criteria for clinical trials based on multimodel ensemble learning. Through experiments, we concluded that performance was improved significantly with a model ensemble compared to a single model. The introduction of focal loss could reduce the impact of class imbalance to achieve better performance.

摘要

背景

入选标准是筛选合适的临床试验参与者的主要策略。利用自然语言处理技术通过数字筛选对临床试验入选标准进行自动分析,可以提高招募效率并降低推进临床研究的相关成本。

目的

我们旨在创建一个自然语言处理模型来自动对临床试验入选标准进行分类。

方法

我们提出了一种基于集成学习的用于短文本入选标准的分类器,其中集成了一组预训练模型。预训练模型包括用于训练和分类的最先进的深度学习方法,包括来自变换器的双向编码器表示(BERT)、XLNet和一种稳健优化的BERT预训练方法(RoBERTa)。将集成模型的分类结果组合为新特征,用于训练一个用于入选标准分类的轻量级梯度提升机(LightGBM)模型。

结果

我们提出的方法在一次国际会议共享任务的标准数据集上获得了0.846的准确率、0.803的精确率和0.817的召回率。宏F1值为0.807,在共享任务上优于最先进的基线方法。

结论

我们设计了一个基于多模型集成学习的用于筛选临床试验短文本分类标准的模型。通过实验,我们得出结论,与单一模型相比,模型集成显著提高了性能。引入焦点损失可以减少类别不平衡的影响以实现更好的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1fd/7367522/033c18cd6e8d/medinform_v8i7e17832_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验