Cusick Marika, Adekkanattu Prakash, Campion Thomas R, Sholle Evan T, Myers Annie, Banerjee Samprit, Alexopoulos George, Wang Yanshan, Pathak Jyotishman
Department of Information and Technology Services, Weill Cornell Medicine, New York, USA; Department Population Health Sciences, Weill Cornell Medicine, New York, USA.
Department of Information and Technology Services, Weill Cornell Medicine, New York, USA.
J Psychiatr Res. 2021 Apr;136:95-102. doi: 10.1016/j.jpsychires.2021.01.052. Epub 2021 Feb 2.
Mental health concerns, such as suicidal thoughts, are frequently documented by providers in clinical notes, as opposed to structured coded data. In this study, we evaluated weakly supervised methods for detecting "current" suicidal ideation from unstructured clinical notes in electronic health record (EHR) systems. Weakly supervised machine learning methods leverage imperfect labels for training, alleviating the burden of creating a large manually annotated dataset. After identifying a cohort of 600 patients at risk for suicidal ideation, we used a rule-based natural language processing approach (NLP) approach to label the training and validation notes (n = 17,978). Using this large corpus of clinical notes, we trained several statistical machine learning models-logistic classifier, support vector machines (SVM), Naive Bayes classifier-and one deep learning model, namely a text classification convolutional neural network (CNN), to be evaluated on a manually-reviewed test set (n = 837). The CNN model outperformed all other methods, achieving an overall accuracy of 94% and a F1-score of 0.82 on documents with "current" suicidal ideation. This algorithm correctly identified an additional 42 encounters and 9 patients indicative of suicidal ideation but missing a structured diagnosis code. When applied to a random subset of 5,000 clinical notes, the algorithm classified 0.46% (n = 23) for "current" suicidal ideation, of which 87% were truly indicative via manual review. Implementation of this approach for large-scale document screening may play an important role in point-of-care clinical information systems for targeted suicide prevention interventions and improve research on the pathways from ideation to attempt.
心理健康问题,如自杀念头,在临床记录中经常被医护人员记录下来,而非结构化编码数据。在本研究中,我们评估了用于从电子健康记录(EHR)系统中的非结构化临床记录检测“当前”自杀意念的弱监督方法。弱监督机器学习方法利用不完美标签进行训练,减轻了创建大型人工标注数据集的负担。在确定了600名有自杀意念风险的患者队列后,我们使用基于规则的自然语言处理方法(NLP)对训练和验证记录(n = 17,978)进行标注。利用这个大量的临床记录语料库,我们训练了几种统计机器学习模型——逻辑分类器、支持向量机(SVM)、朴素贝叶斯分类器——以及一种深度学习模型,即文本分类卷积神经网络(CNN),并在人工审核的测试集(n = 837)上进行评估。CNN模型优于所有其他方法,在有“当前”自杀意念的文档上总体准确率达到94%,F1分数为0.82。该算法正确识别出另外42次就诊和9名有自杀意念迹象但缺少结构化诊断代码的患者。当应用于5000份临床记录的随机子集时,该算法将0.46%(n = 23)分类为有“当前”自杀意念,其中87%经人工审核确实有自杀意念迹象。这种方法在大规模文档筛查中的应用可能在即时护理临床信息系统中发挥重要作用,用于针对性自杀预防干预,并改善对从意念到自杀未遂途径的研究。