Workman Terri Elizabeth, Goulet Joseph L, Brandt Cynthia A, Warren Allison R, Eleazer Jacob, Skanderson Melissa, Lindemann Luke, Blosnich John R, O'Leary John, Zeng-Treitler Qing
Biomedical Informatics Center The George Washington University Washington District of Columbia USA.
VA Medical Center Washington District of Columbia USA.
Health Sci Rep. 2023 Sep 11;6(9):e1526. doi: 10.1002/hsr2.1526. eCollection 2023 Sep.
In deep learning, a major difficulty in identifying suicidality and its risk factors in clinical notes is the lack of training samples given the small number of true positive instances among the number of patients screened. This paper describes a novel methodology that identifies suicidality in clinical notes by addressing this data sparsity issue through zero-shot learning. Our general aim was to develop a tool that leveraged zero-shot learning to effectively identify suicidality documentation in all types of clinical notes.
US Veterans Affairs clinical notes served as data. The training data set label was determined using diagnostic codes of suicide attempt and self-harm. We used a base string associated with the target label of suicidality to provide auxiliary information by narrowing the positive training cases to those containing the base string. We trained a deep neural network by mapping the training documents' contents to a semantic space. For comparison, we trained another deep neural network using the identical training data set labels, and bag-of-words features.
The zero-shot learning model outperformed the baseline model in terms of area under the curve, sensitivity, specificity, and positive predictive value at multiple probability thresholds. In applying a 0.90 probability threshold, the methodology identified notes documenting suicidality but not associated with a relevant ICD-10-CM code, with 94% accuracy.
This method can effectively identify suicidality without manual annotation.
在深度学习中,鉴于在筛查的患者数量中真正阳性实例较少,临床记录中识别自杀倾向及其风险因素的一个主要困难是缺乏训练样本。本文描述了一种新颖的方法,通过零样本学习解决数据稀疏问题,从而在临床记录中识别自杀倾向。我们的总体目标是开发一种利用零样本学习来有效识别各类临床记录中自杀倾向记录的工具。
美国退伍军人事务部的临床记录用作数据。训练数据集标签使用自杀未遂和自我伤害的诊断代码来确定。我们使用与自杀倾向目标标签相关联的基础字符串,通过将阳性训练病例缩小到包含该基础字符串的病例来提供辅助信息。我们通过将训练文档的内容映射到语义空间来训练深度神经网络。为了进行比较,我们使用相同的训练数据集标签和词袋特征训练了另一个深度神经网络。
在多个概率阈值下,零样本学习模型在曲线下面积、敏感性、特异性和阳性预测值方面均优于基线模型。在应用0.90的概率阈值时,该方法识别出记录了自杀倾向但与相关ICD-10-CM代码无关的记录,准确率为94%。
该方法无需人工标注即可有效识别自杀倾向。