Department of Biomedical Informatics.
Department of Medicine.
J Am Med Inform Assoc. 2018 Jan 1;25(1):61-71. doi: 10.1093/jamia/ocx059.
Understanding how to identify the social determinants of health from electronic health records (EHRs) could provide important insights to understand health or disease outcomes. We developed a methodology to capture 2 rare and severe social determinants of health, homelessness and adverse childhood experiences (ACEs), from a large EHR repository.
We first constructed lexicons to capture homelessness and ACE phenotypic profiles. We employed word2vec and lexical associations to mine homelessness-related words. Next, using relevance feedback, we refined the 2 profiles with iterative searches over 100 million notes from the Vanderbilt EHR. Seven assessors manually reviewed the top-ranked results of 2544 patient visits relevant for homelessness and 1000 patients relevant for ACE.
word2vec yielded better performance (area under the precision-recall curve [AUPRC] of 0.94) than lexical associations (AUPRC = 0.83) for extracting homelessness-related words. A comparative study of searches for the 2 phenotypes revealed a higher performance achieved for homelessness (AUPRC = 0.95) than ACE (AUPRC = 0.79). A temporal analysis of the homeless population showed that the majority experienced chronic homelessness. Most ACE patients suffered sexual (70%) and/or physical (50.6%) abuse, with the top-ranked abuser keywords being "father" (21.8%) and "mother" (15.4%). Top prevalent associated conditions for homeless patients were lack of housing (62.8%) and tobacco use disorder (61.5%), while for ACE patients it was mental disorders (36.6%-47.6%).
We provide an efficient solution for mining homelessness and ACE information from EHRs, which can facilitate large clinical and genetic studies of these social determinants of health.
了解如何从电子健康记录 (EHR) 中识别健康的社会决定因素,可以为了解健康或疾病结果提供重要的见解。我们开发了一种从大型 EHR 存储库中捕获 2 种罕见且严重的社会决定因素(无家可归和不良儿童经历 (ACE))的方法。
我们首先构建了捕获无家可归和 ACE 表型特征的词汇。我们使用 word2vec 和词汇关联来挖掘与无家可归相关的词汇。接下来,使用相关性反馈,我们通过在范德比尔特 EHR 中的 1 亿多条笔记上进行迭代搜索,对这 2 个配置文件进行了细化。7 名评估员手动审查了与无家可归相关的 2544 次就诊和 1000 次与 ACE 相关的就诊中排名最高的结果。
与词汇关联 (AUPRC=0.83) 相比,word2vec 提取与无家可归相关的词汇的性能更好(精度-召回率曲线下的面积 [AUPRC] 为 0.94)。对这 2 种表型的搜索进行比较研究发现,无家可归的表现更好(AUPRC=0.95),而 ACE 的表现更好(AUPRC=0.79)。对无家可归人口的时间分析表明,大多数人经历了慢性无家可归。大多数 ACE 患者遭受性(70%)和/或身体(50.6%)虐待,排名最高的施虐者关键词是“父亲”(21.8%)和“母亲”(15.4%)。无家可归患者最常见的相关疾病是缺乏住房(62.8%)和烟草使用障碍(61.5%),而 ACE 患者则是精神障碍(36.6%-47.6%)。
我们提供了一种从 EHR 中挖掘无家可归和 ACE 信息的有效解决方案,这可以促进对这些健康社会决定因素的大型临床和遗传研究。