Health Equity Research Lab, Cambridge Health Alliance, Cambridge, MA, United States of America.
Department of Psychiatry, Harvard Medical School, Boston, MA, United States of America.
PLoS One. 2019 Feb 19;14(2):e0211116. doi: 10.1371/journal.pone.0211116. eCollection 2019.
The rapid proliferation of machine learning research using electronic health records to classify healthcare outcomes offers an opportunity to address the pressing public health problem of adolescent suicidal behavior. We describe the development and evaluation of a machine learning algorithm using natural language processing of electronic health records to identify suicidal behavior among psychiatrically hospitalized adolescents.
Adolescents hospitalized on a psychiatric inpatient unit in a community health system in the northeastern United States were surveyed for history of suicide attempt in the past 12 months. A total of 73 respondents had electronic health records available prior to the index psychiatric admission. Unstructured clinical notes were downloaded from the year preceding the index inpatient admission. Natural language processing identified phrases from the notes associated with the suicide attempt outcome. We enriched this group of phrases with a clinically focused list of terms representing known risk and protective factors for suicide attempt in adolescents. We then applied the random forest machine learning algorithm to develop a classification model. The model performance was evaluated using sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy.
The final model had a sensitivity of 0.83, specificity of 0.22, AUC of 0.68, a PPV of 0.42, NPV of 0.67, and an accuracy of 0.47. The terms mostly highly associated with suicide attempt clustered around terms related to suicide, family members, psychiatric disorders, and psychotropic medications.
This analysis demonstrates modest success of a natural language processing and machine learning approach to identifying suicide attempt among a small sample of hospitalized adolescents in a psychiatric setting.
利用电子健康记录进行机器学习研究来对医疗保健结果进行分类,这为解决青少年自杀行为这一紧迫的公共卫生问题提供了机会。我们描述了一种机器学习算法的开发和评估,该算法使用电子健康记录的自然语言处理来识别精神科住院青少年的自杀行为。
美国东北部社区卫生系统的精神科住院患者中,对过去 12 个月内有自杀企图史的患者进行了调查。共有 73 名患者在索引精神科入院前有电子健康记录。从索引住院入院前一年下载了非结构化临床笔记。自然语言处理从记录中识别出与自杀企图结果相关的短语。我们用代表青少年自杀企图已知风险和保护因素的临床重点术语集来丰富这组短语。然后,我们应用随机森林机器学习算法来开发分类模型。使用敏感性、特异性、阳性预测值 (PPV)、阴性预测值 (NPV) 和准确性来评估模型性能。
最终模型的敏感性为 0.83,特异性为 0.22,AUC 为 0.68,PPV 为 0.42,NPV 为 0.67,准确性为 0.47。与自杀企图最相关的术语主要集中在与自杀、家庭成员、精神障碍和精神药物相关的术语上。
这项分析表明,在精神科环境中对小样本住院青少年进行自然语言处理和机器学习方法识别自杀企图具有一定的效果。