IEEE J Biomed Health Inform. 2022 Jan;26(1):379-387. doi: 10.1109/JBHI.2021.3095478. Epub 2022 Jan 17.
Cohort selection is an essential prerequisite for clinical research, determining whether an individual satisfies given selection criteria. Previous works for cohort selection usually treated each selection criterion independently and ignored not only the meaning of each selection criterion but the relations among cohort selection criteria. To solve the problems above, we propose a novel unified machine reading comprehension (MRC) framework. In this MRC framework, we design simple rules to generate questions for each criterion from cohort selection guidelines and treat clues extracted by trigger words from patients' medical records as passages. A series of state-of-the-art MRC models based on BiDAF, BIMPM, BERT, BioBERT, NCBI-BERT, and RoBERTa are deployed to determine which question and passage pairs match. We also introduce a cross-criterion attention mechanism on representations of question and passage pairs to model relations among cohort selection criteria. Results on two datasets, that is, the dataset of the 2018 National NLP Clinical Challenge (N2C2) for cohort selection and a dataset from the MIMIC-III dataset, show that our NCBI-BERT MRC model with cross-criterion attention mechanism achieves the highest micro-averaged F1-score of 0.9070 on the N2C2 dataset and 0.8353 on the MIMIC-III dataset. It is competitive to the best system that relies on a large number of rules defined by medical experts on the N2C2 dataset. Comparing these two models, we find that the NCBI-BERT MRC model mainly performs worse on mathematical logic criteria. When using rules instead of the NCBI-BERT MRC model on some criteria regarding mathematical logic on the N2C2 dataset, we obtain a new benchmark with an F1-score of 0.9163, indicating that it is easy to integrate rules into MRC models for improvement.
队列选择是临床研究的一个基本前提,决定了个体是否符合给定的选择标准。以前的队列选择工作通常独立地处理每个选择标准,不仅忽略了每个选择标准的含义,还忽略了队列选择标准之间的关系。为了解决上述问题,我们提出了一个新的统一的机器阅读理解(MRC)框架。在这个 MRC 框架中,我们设计了简单的规则,从队列选择指南中为每个标准生成问题,并将从患者病历中提取的触发词线索作为段落。部署了一系列基于 BiDAF、BIMPM、BERT、BioBERT、NCBI-BERT 和 RoBERTa 的最先进的 MRC 模型,以确定哪些问题和段落对匹配。我们还在问题和段落对的表示上引入了交叉标准注意力机制,以对队列选择标准之间的关系进行建模。在两个数据集上的结果,即 2018 年国家自然语言处理临床挑战赛(N2C2)的队列选择数据集和 MIMIC-III 数据集的一个数据集上的结果表明,我们的带有交叉标准注意力机制的 NCBI-BERT MRC 模型在 N2C2 数据集上的微平均 F1 得分为 0.9070,在 MIMIC-III 数据集上的微平均 F1 得分为 0.8353,这在 N2C2 数据集上是竞争最激烈的,仅次于依赖于大量医学专家定义的规则的最佳系统。将这两个模型进行比较,我们发现 NCBI-BERT MRC 模型在数学逻辑标准上的性能主要较差。当在 N2C2 数据集上的一些涉及数学逻辑的标准上使用规则而不是 NCBI-BERT MRC 模型时,我们获得了一个新的 F1 得分为 0.9163 的基准,这表明很容易将规则集成到 MRC 模型中以进行改进。