Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
Department of Medicine, Division of Rheumatology, Immunology and Allergy, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
BMC Med Inform Decis Mak. 2018 May 29;18(1):30. doi: 10.1186/s12911-018-0617-7.
We examined the comparative performance of structured, diagnostic codes vs. natural language processing (NLP) of unstructured text for screening suicidal behavior among pregnant women in electronic medical records (EMRs).
Women aged 10-64 years with at least one diagnostic code related to pregnancy or delivery (N = 275,843) from Partners HealthCare were included as our "datamart." Diagnostic codes related to suicidal behavior were applied to the datamart to screen women for suicidal behavior. Among women without any diagnostic codes related to suicidal behavior (n = 273,410), 5880 women were randomly sampled, of whom 1120 had at least one mention of terms related to suicidal behavior in clinical notes. NLP was then used to process clinical notes for the 1120 women. Chart reviews were performed for subsamples of women.
Using diagnostic codes, 196 pregnant women were screened positive for suicidal behavior, among whom 149 (76%) had confirmed suicidal behavior by chart review. Using NLP among those without diagnostic codes, 486 pregnant women were screened positive for suicidal behavior, among whom 146 (30%) had confirmed suicidal behavior by chart review.
The use of NLP substantially improves the sensitivity of screening suicidal behavior in EMRs. However, the prevalence of confirmed suicidal behavior was lower among women who did not have diagnostic codes for suicidal behavior but screened positive by NLP. NLP should be used together with diagnostic codes for future EMR-based phenotyping studies for suicidal behavior.
我们研究了结构化诊断代码与非结构化文本的自然语言处理(NLP)在电子病历(EMR)中筛查孕妇自杀行为的比较性能。
我们纳入了来自 Partners HealthCare 的至少有一个与妊娠或分娩相关诊断代码的年龄在 10-64 岁的女性(n=275843)作为我们的“数据集市”。将与自杀行为相关的诊断代码应用于数据集市,以筛查自杀行为的女性。在没有任何与自杀行为相关诊断代码的女性(n=273410)中,随机抽取了 5880 名女性,其中 1120 名女性的临床记录中至少有一次提及与自杀行为相关的术语。然后使用 NLP 处理这 1120 名女性的临床记录。对女性进行了子样本的图表审查。
使用诊断代码,筛查出 196 名孕妇有自杀行为,其中 149 名(76%)通过图表审查确认有自杀行为。在没有诊断代码的情况下使用 NLP,筛查出 486 名孕妇有自杀行为,其中 146 名(30%)通过图表审查确认有自杀行为。
NLP 的使用大大提高了 EMR 中筛查自杀行为的敏感性。然而,在没有诊断代码但通过 NLP 筛查阳性的女性中,确认自杀行为的比例较低。在未来基于 EMR 的自杀行为表型研究中,应将 NLP 与诊断代码一起使用。