Rajendran Suraj, Topaloglu Umit
Wake Forest University School of Medicine, Winston Salem, NC.
Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA.
AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:507-516. eCollection 2020.
Half a million people die every year from smoking-related issues across the United States. It is essential to identify individuals who are tobacco-dependent in order to implement preventive measures. In this study, we investigate the effectiveness of deep learning models to extract smoking status of patients from clinical progress notes. A Natural Language Processing (NLP) Pipeline was built that cleans the progress notes prior to processing by three deep neural networks: a CNN, a unidirectional LSTM, and a bidirectional LSTM. Each of these models was trained with a pre- trained or a post-trained word embedding layer. Three traditional machine learning models were also employed to compare against the neural networks. Each model has generated both binary and multi-class label classification. Our results showed that the CNN model with a pre-trained embedding layer performed the best for both binary and multi- class label classification.
在美国,每年有50万人死于与吸烟相关的问题。识别烟草依赖者对于实施预防措施至关重要。在本研究中,我们调查了深度学习模型从临床病程记录中提取患者吸烟状况的有效性。构建了一个自然语言处理(NLP)管道,该管道在由三个深度神经网络(一个卷积神经网络(CNN)、一个单向长短期记忆网络(LSTM)和一个双向长短期记忆网络)处理之前对病程记录进行清理。这些模型中的每一个都使用预训练或后训练的词嵌入层进行训练。还采用了三种传统机器学习模型与神经网络进行比较。每个模型都生成了二元和多类标签分类。我们的结果表明,具有预训练嵌入层的CNN模型在二元和多类标签分类方面表现最佳。