Department of Health Science Research, Mayo Clinic, MN, USA.
Department of Health Science Research, Mayo Clinic, MN, USA.
Int J Med Inform. 2019 Oct;130:103943. doi: 10.1016/j.ijmedinf.2019.08.003. Epub 2019 Aug 6.
Previous biomedical studies identified many lifestyle exposures that could possibly represent risk factors for dementia in general or dementia due to Alzheimer's disease (AD). These lifestyle exposures are mainly mentioned in free-text electronic health records (EHRs). However, automatic extraction and assessment of these exposures using EHRs remains understudied.
A natural language processing (NLP) approach was adopted to extract lifestyle exposures and intervention strategies from the clinical notes of 260 patients with clinical diagnoses of AD dementia and 260 age-matched cognitively unimpaired persons. Statistics of lifestyle exposures were compared between these two groups. The mapping results of the NLP extraction were evaluated by comparing the results with data captured independently by clinicians.
Thirty out of fifty-five potentially relevant lifestyle exposures were mentioned in our clinical note dataset. Twenty-two dietary factors and three substance abuses that were potentially relevant were not found in clinical notes. Patients with AD dementia were significantly exposed to more of the potential risk factors compared to the cognitively unimpaired subjects (χ2 = 120.31, p-value < 0.001). The average accuracy of the automated extraction was 74.0% in comparison with the manual review of randomly selected 50 sample documents.
We illustrated the feasibility of NLP techniques for the automated evaluation of a large number lifestyle habits using free-text EHR data. We found that AD dementia patients were exposed to more of the potential risk factors than the comparison group. Our results also demonstrated the feasibility and accuracy of investigating putative risk factors using NLP techniques.
以前的生物医学研究确定了许多生活方式暴露,这些暴露可能代表了痴呆症的一般风险因素或阿尔茨海默病(AD)导致的痴呆症风险因素。这些生活方式暴露主要在电子健康记录(EHR)的自由文本中提及。然而,使用 EHR 自动提取和评估这些暴露的方法仍研究不足。
采用自然语言处理(NLP)方法从 260 例临床诊断为 AD 痴呆的患者和 260 例年龄匹配的认知正常的患者的临床记录中提取生活方式暴露和干预策略。比较这两组患者的生活方式暴露情况。通过将 NLP 提取的结果与临床医生独立记录的数据进行比较,评估 NLP 提取的映射结果。
在我们的临床记录数据集中,55 种潜在相关生活方式暴露中有 30 种被提及。没有在临床记录中发现 22 种潜在相关的饮食因素和 3 种物质滥用。与认知正常的受试者相比,AD 痴呆患者明显接触到更多的潜在风险因素(χ2=120.31,p 值<0.001)。与随机选择的 50 份样本文件的手动审查相比,自动提取的平均准确率为 74.0%。
我们说明了使用 NLP 技术从自由文本 EHR 数据中自动评估大量生活习惯的可行性。我们发现 AD 痴呆患者比对照组接触到更多的潜在风险因素。我们的结果还证明了使用 NLP 技术调查潜在风险因素的可行性和准确性。