Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, China.
School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.
BMC Med Inform Decis Mak. 2019 Jan 31;19(Suppl 1):17. doi: 10.1186/s12911-019-0735-x.
The goal of temporal indexing is to select an occurred time or time interval for each medical entity in clinical notes, so that all medical entities can be indexed on a united timeline, which could assist the understanding of clinical notes and the further application of medical entities. Some temporal relation shared tasks for the medical entity in English clinical notes have been organized in the past few years, such as the 2012 i2b2 NLP challenge, 2015 and 2016 clinical TempEval challenges. In these tasks, many heuristics rule-based and machine learning-based systems have been developed. In recent years, the deep neural network models have shown great potential on many problems including the relation classification.
In this paper, we propose a recurrent convolutional neural network (RNN-CNN) model for the temporal indexing task, which consists of four layers: input layer - generates representation for each context word of medical entities or temporal expressions; LSTM (long-short term memory) layer - learns the context information of each word in a sentence and outputs a new word representation sequence; CNN layer - extracts meaningful features from a sentence and outputs a new representation for medical entity or temporal expression; Output layer - takes the representations of medical entity, temporal expression and relation features as input and classifies the temporal relation. Finally, the time or time interval for each medical entity can be directly selected according to the probability of each temporal relation predicted by above model.
To investigate the performance of our RNN-CNN model for the temporal indexing task, several baseline methods were also employed, such as the rule-based, support vector machine (SVM), convolutional neural network (CNN) and recurrent neural network (RNN) methods. Experiments conducted on a manually annotated corpus (including 563 clinical notes with 12,611 medical entities and 4006 temporal expressions) show that RNN-CNN model achieves the best F1-score of 75.97% for temporal relation classification and the best accuracy of 71.96% for temporal indexing.
Neural network methods perform much better than the traditional rule-based and SVM-based method, which can capture more semantic information from the context of medical entities and temporal expressions. Besides, all our methods perform much better for the accurate time indexing than the time interval indexing, so how to improve the performance for time interval indexing will be the main focus in our future work.
时间标注的目标是为临床记录中的每个医学实体选择一个发生时间或时间区间,以便将所有医学实体都标注在一个统一的时间线上,这有助于理解临床记录和进一步应用医学实体。在过去几年中,已经组织了一些针对英文临床记录中医学实体的时间关系共享任务,例如 2012 年的 i2b2 NLP 挑战赛、2015 年和 2016 年的临床 TempEval 挑战赛。在这些任务中,已经开发了许多基于启发式规则和机器学习的系统。近年来,深度神经网络模型在包括关系分类在内的许多问题上显示出了巨大的潜力。
本文提出了一种用于时间标注任务的递归卷积神经网络(RNN-CNN)模型,该模型由四层组成:输入层 - 为医学实体或时间表达式的每个上下文词生成表示;LSTM(长短时记忆)层 - 学习句子中每个词的上下文信息,并输出一个新的词表示序列;CNN 层 - 从句子中提取有意义的特征,并输出医学实体或时间表达式的新表示;输出层 - 以上述模型预测的每个时间关系的概率作为输入,对医学实体、时间表达式和关系特征进行分类。最后,可以根据上述模型预测的每个时间关系的概率直接选择每个医学实体的时间或时间区间。
为了研究我们的 RNN-CNN 模型在时间标注任务中的性能,还采用了几种基线方法,例如基于规则的、支持向量机(SVM)、卷积神经网络(CNN)和递归神经网络(RNN)方法。在一个手动标注语料库(包括 563 份临床记录,其中包含 12611 个医学实体和 4006 个时间表达式)上进行的实验表明,RNN-CNN 模型在时间关系分类方面取得了最佳的 F1 分数 75.97%,在时间标注方面取得了最佳的准确性 71.96%。
神经网络方法比传统的基于规则和 SVM 的方法表现要好得多,因为它们可以从医学实体和时间表达式的上下文中捕捉到更多的语义信息。此外,我们的所有方法在准确的时间标注方面都表现得更好,而不是时间区间标注,因此如何提高时间区间标注的性能将是我们未来工作的重点。