Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, United States; Harvard Medical School, Boston, MA, United States.
Harvard Medical School, Boston, MA, United States; Rheumatology Unit, Division of Rheumatology, Allergy, and Immunology, Massachusetts General Hospital, Boston, MA, United States.
Semin Arthritis Rheum. 2021 Feb;51(1):150-157. doi: 10.1016/j.semarthrit.2020.10.012. Epub 2020 Dec 24.
Clinical notes from electronic health records (EHR) are important to characterize the natural history, comorbidities, and complications of ANCA-associated vasculitis (AAV) because these details may not be captured by claims and structured data. However, labor-intensive chart review is often required to extract information from notes. We hypothesized that machine learning can automatically discover clinically-relevant themes across longitudinal notes to study AAV.
This retrospective study included prevalent PR3- or MPO-ANCA+ AAV cases managed within the Mass General Brigham integrated health care system with providers' notes available between March 1, 1990 and August 23, 2018. We generated clinically-relevant topics mentioned in notes using latent Dirichlet allocation-based topic modeling and conducted trend analyses of those topics over the 2 years prior to and 5 years after the initiation of AAV-specific treatment.
The study cohort included 660 patients with AAV. We generated 90 topics using 113,048 available notes. Topics were related to the AAV diagnosis, treatment, symptoms and manifestations (e.g., glomerulonephritis), and complications (e.g., end-stage renal disease, infection). AAV-related symptoms and psychiatric symptoms were mentioned months before treatment initiation. Topics related to pulmonary and renal diseases, diabetes, and infections were common during the disease course but followed distinct temporal patterns.
Automated topic modeling can be used to discover clinically-relevant themes and temporal patterns related to the diagnosis, treatment, comorbidities, and complications of AAV from EHR notes. Future research might compare the temporal patterns in a non-AAV cohort and leverage clinical notes to identify possible AAV cases prospectively.
电子健康记录(EHR)中的临床记录对于描述抗中性粒细胞胞浆抗体(ANCA)相关性血管炎(AAV)的自然病史、合并症和并发症非常重要,因为这些细节可能无法通过索赔和结构化数据捕捉到。然而,通常需要进行繁琐的图表审查才能从记录中提取信息。我们假设机器学习可以自动发现纵向记录中的临床相关主题,以研究 AAV。
这项回顾性研究包括在马萨诸塞州综合医疗保健系统内接受治疗的现患 PR3 或 MPO-ANCA+ AAV 病例,这些患者的病历记录可从 1990 年 3 月 1 日至 2018 年 8 月 23 日获得。我们使用基于潜在狄利克雷分配的主题建模生成记录中提到的临床相关主题,并对 AAV 特异性治疗前 2 年和治疗后 5 年内这些主题的趋势进行分析。
该研究队列包括 660 例 AAV 患者。我们使用 113,048 份可用记录生成了 90 个主题。主题与 AAV 诊断、治疗、症状和表现(如肾小球肾炎)以及并发症(如终末期肾病、感染)有关。AAV 相关症状和精神症状在治疗前数月就已出现。与肺部和肾脏疾病、糖尿病和感染相关的主题在疾病过程中很常见,但遵循不同的时间模式。
自动主题建模可用于从 EHR 记录中发现与 AAV 的诊断、治疗、合并症和并发症相关的临床相关主题和时间模式。未来的研究可能会将这些时间模式与非 AAV 队列进行比较,并利用临床记录前瞻性地识别可能的 AAV 病例。