Department of Electronic Engineering, Tsinghua University, Beijing, China.
Tsinghua-iFlytek Joint Laboratory, Beijing, China.
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):54. doi: 10.1186/s12911-019-0756-5.
Medical event detection in narrative clinical notes of electronic health records (EHRs) is a task designed for reading text and extracting information. Most of the previous work of medical event detection treats the task as extracting concepts at word granularity, which omits the overall structural information of the clinical notes. In this work, we treat each clinical note as a sequence of short sentences and propose an end-to-end deep neural network framework.
We redefined the task as a sequence labelling task at short sentence granularity, and proposed a novel tag system correspondingly. The dataset were derived from a third-level grade-A hospital, consisting of 2000 annotated clinical notes according to our proposed tag system. The proposed end-to-end deep neural network framework consists of a feature extractor and a sequence labeller, and we explored different implementations respectively. We additionally proposed a smoothed Viterbi decoder as sequence labeller without additional parameter training, which can be a good alternative to conditional random field (CRF) when computing resources are limited.
Our sequence labelling models were compared to four baselines which treat the task as text classification of short sentences. Experimental results showed that our approach significantly outperforms the baselines. The best result was obtained by using the convolutional neural networks (CNNs) feature extractor and the sequential CRF sequence labeller, achieving an accuracy of 92.6%. Our proposed smoothed Viterbi decoder achieved a comparable accuracy of 90.07% with reduced training parameters, and brought more balanced performance across all categories, which means better generalization ability.
Evaluated on our annotated dataset, the comparison results demonstrated the effectiveness of our approach for medical event detection in Chinese clinical notes of EHRs. The best feature extractor is the CNNs feature extractor, and the best sequence labeller is the sequential CRF decoder. And it was empirically verified that our proposed smoothed Viterbi decoder could bring better generalization ability while achieving comparable performance to the sequential CRF decoder.
电子健康记录(EHR)中的叙事临床记录中的医学事件检测是一项旨在阅读文本和提取信息的任务。之前大多数医学事件检测的工作都将该任务视为在单词粒度上提取概念,从而忽略了临床记录的整体结构信息。在这项工作中,我们将每个临床记录视为短句子序列,并提出了一个端到端的深度神经网络框架。
我们重新定义了该任务为短句子粒度的序列标记任务,并相应地提出了一个新的标记系统。该数据集源自一家三级甲等医院,根据我们提出的标记系统包含 2000 个已注释的临床记录。所提出的端到端深度神经网络框架由特征提取器和序列标记器组成,我们分别探索了不同的实现方式。我们还提出了一个平滑的维特比解码器作为序列标记器,无需额外的参数训练,在计算资源有限的情况下,它可以作为条件随机场(CRF)的一个很好的替代方案。
我们的序列标记模型与将任务视为短句子文本分类的四个基线进行了比较。实验结果表明,我们的方法显著优于基线。使用卷积神经网络(CNNs)特征提取器和序列 CRF 序列标记器获得了最佳结果,准确率为 92.6%。我们提出的平滑维特比解码器具有可比较的准确率 90.07%,同时减少了训练参数,并在所有类别中带来了更均衡的性能,这意味着更好的泛化能力。
在我们的标注数据集上进行评估,比较结果表明了我们的方法在 EHR 中文临床记录中的医学事件检测中的有效性。最佳的特征提取器是 CNNs 特征提取器,最佳的序列标记器是序列 CRF 解码器。并且经验验证了我们提出的平滑维特比解码器在实现可与序列 CRF 解码器相媲美的性能的同时,能够带来更好的泛化能力。