Northwestern University, Chicago 60611, IL, USA.
Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago 60611, IL, USA.
BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):71. doi: 10.1186/s12911-019-0781-4.
Clinical text classification is an fundamental problem in medical natural language processing. Existing studies have cocnventionally focused on rules or knowledge sources-based feature engineering, but only a limited number of studies have exploited effective representation learning capability of deep learning methods.
In this study, we propose a new approach which combines rule-based features and knowledge-guided deep learning models for effective disease classification. Critical Steps of our method include recognizing trigger phrases, predicting classes with very few examples using trigger phrases and training a convolutional neural network (CNN) with word embeddings and Unified Medical Language System (UMLS) entity embeddings.
We evaluated our method on the 2008 Integrating Informatics with Biology and the Bedside (i2b2) obesity challenge. The results demonstrate that our method outperforms the state-of-the-art methods.
We showed that CNN model is powerful for learning effective hidden features, and CUIs embeddings are helpful for building clinical text representations. This shows integrating domain knowledge into CNN models is promising.
临床文本分类是医学自然语言处理中的一个基本问题。现有研究通常侧重于基于规则或知识源的特征工程,但只有少数研究利用了深度学习方法的有效表示学习能力。
在这项研究中,我们提出了一种新的方法,将基于规则的特征和知识引导的深度学习模型相结合,用于有效的疾病分类。我们方法的关键步骤包括识别触发词、使用触发词预测少量示例的类别以及使用词嵌入和统一医学语言系统(UMLS)实体嵌入训练卷积神经网络(CNN)。
我们在 2008 年整合信息学与生物学和床边(i2b2)肥胖挑战中评估了我们的方法。结果表明,我们的方法优于最先进的方法。
我们表明,CNN 模型对于学习有效的隐藏特征非常强大,而 CUIs 嵌入有助于构建临床文本表示。这表明将领域知识集成到 CNN 模型中是有前途的。