Luo Yuan
Department of Preventive Medicine, Division of Health and Biomedical Informatics, Northwestern University, Chicago, IL, United States.
J Biomed Inform. 2017 Aug;72:85-95. doi: 10.1016/j.jbi.2017.07.006. Epub 2017 Jul 8.
We proposed the first models based on recurrent neural networks (more specifically Long Short-Term Memory - LSTM) for classifying relations from clinical notes. We tested our models on the i2b2/VA relation classification challenge dataset. We showed that our segment LSTM model, with only word embedding feature and no manual feature engineering, achieved a micro-averaged f-measure of 0.661 for classifying medical problem-treatment relations, 0.800 for medical problem-test relations, and 0.683 for medical problem-medical problem relations. These results are comparable to those of the state-of-the-art systems on the i2b2/VA relation classification challenge. We compared the segment LSTM model with the sentence LSTM model, and demonstrated the benefits of exploring the difference between concept text and context text, and between different contextual parts in the sentence. We also evaluated the impact of word embedding on the performance of LSTM models and showed that medical domain word embedding help improve the relation classification. These results support the use of LSTM models for classifying relations between medical concepts, as they show comparable performance to previously published systems while requiring no manual feature engineering.
我们提出了首个基于循环神经网络(更具体地说是长短期记忆网络 - LSTM)的模型,用于对临床记录中的关系进行分类。我们在i2b2/VA关系分类挑战数据集上测试了我们的模型。我们表明,我们的片段LSTM模型,仅具有词嵌入特征且无需人工特征工程,在对医疗问题 - 治疗关系进行分类时,微平均F值达到0.661,对医疗问题 - 测试关系为0.800,对医疗问题 - 医疗问题关系为0.683。这些结果与i2b2/VA关系分类挑战中最先进系统的结果相当。我们将片段LSTM模型与句子LSTM模型进行了比较,并证明了探索概念文本与上下文文本之间以及句子中不同上下文部分之间差异的好处。我们还评估了词嵌入对LSTM模型性能的影响,并表明医学领域词嵌入有助于提高关系分类。这些结果支持使用LSTM模型对医学概念之间的关系进行分类,因为它们显示出与先前发表的系统相当的性能,同时无需人工特征工程。