Department of Preventive Medicine, Northwestern University, Chicago, IL, USA.
AI Foundations, IBM Thomas J Watson Research Center, Yorktown Heights, NY, USA.
J Am Med Inform Assoc. 2018 Jan 1;25(1):93-98. doi: 10.1093/jamia/ocx090.
We propose Segment Convolutional Neural Networks (Seg-CNNs) for classifying relations from clinical notes. Seg-CNNs use only word-embedding features without manual feature engineering. Unlike typical CNN models, relations between 2 concepts are identified by simultaneously learning separate representations for text segments in a sentence: preceding, concept1, middle, concept2, and succeeding. We evaluate Seg-CNN on the i2b2/VA relation classification challenge dataset. We show that Seg-CNN achieves a state-of-the-art micro-average F-measure of 0.742 for overall evaluation, 0.686 for classifying medical problem-treatment relations, 0.820 for medical problem-test relations, and 0.702 for medical problem-medical problem relations. We demonstrate the benefits of learning segment-level representations. We show that medical domain word embeddings help improve relation classification. Seg-CNNs can be trained quickly for the i2b2/VA dataset on a graphics processing unit (GPU) platform. These results support the use of CNNs computed over segments of text for classifying medical relations, as they show state-of-the-art performance while requiring no manual feature engineering.
我们提出了用于从临床笔记中分类关系的分段卷积神经网络(Seg-CNN)。Seg-CNN 仅使用词嵌入特征,而无需手动进行特征工程。与典型的 CNN 模型不同,两个概念之间的关系是通过同时学习句子中文本片段的单独表示来识别的:前、概念 1、中、概念 2 和后。我们在 i2b2/VA 关系分类挑战数据集上评估了 Seg-CNN。我们表明,Seg-CNN 在整体评估中达到了 0.742 的微平均 F1 分数的最新水平,在分类医疗问题-治疗关系方面达到了 0.686,在分类医疗问题-测试关系方面达到了 0.820,在分类医疗问题-医疗问题关系方面达到了 0.702。我们展示了学习片段级表示的好处。我们表明,医学领域的词嵌入有助于提高关系分类。Seg-CNN 可以在图形处理单元(GPU)平台上快速针对 i2b2/VA 数据集进行训练。这些结果支持使用在文本片段上计算的 CNN 来分类医学关系,因为它们在不需要手动特征工程的情况下表现出了最新的性能。