Droice Research, New York, NY, USA.
Department of Surgery, NYU Langone Hospital Long Island, Mineola, NY, USA.
Sci Rep. 2020 Dec 7;10(1):21340. doi: 10.1038/s41598-020-77286-6.
As a leading cause of death and morbidity, heart failure (HF) is responsible for a large portion of healthcare and disability costs worldwide. Current approaches to define specific HF subpopulations may fail to account for the diversity of etiologies, comorbidities, and factors driving disease progression, and therefore have limited value for clinical decision making and development of novel therapies. Here we present a novel and data-driven approach to understand and characterize the real-world manifestation of HF by clustering disease and symptom-related clinical concepts (complaints) captured from unstructured electronic health record clinical notes. We used natural language processing to construct vectorized representations of patient complaints followed by clustering to group HF patients by similarity of complaint vectors. We then identified complaints that were significantly enriched within each cluster using statistical testing. Breaking the HF population into groups of similar patients revealed a clinically interpretable hierarchy of subgroups characterized by similar HF manifestation. Importantly, our methodology revealed well-known etiologies, risk factors, and comorbid conditions of HF (including ischemic heart disease, aortic valve disease, atrial fibrillation, congenital heart disease, various cardiomyopathies, obesity, hypertension, diabetes, and chronic kidney disease) and yielded additional insights into the details of each HF subgroup's clinical manifestation of HF. Our approach is entirely hypothesis free and can therefore be readily applied for discovery of novel insights in alternative diseases or patient populations.
心力衰竭(HF)是全球范围内导致死亡和发病的主要原因之一,其导致了大量的医疗保健和残疾费用。目前用于定义特定 HF 亚人群的方法可能无法考虑病因、合并症和导致疾病进展的因素的多样性,因此对于临床决策制定和新型疗法的开发的价值有限。在这里,我们提出了一种新颖的数据驱动方法,通过对从非结构化电子健康记录临床记录中捕获的与疾病和症状相关的临床概念(投诉)进行聚类,来了解和描述 HF 的真实表现。我们使用自然语言处理来构建患者投诉的向量表示,然后通过聚类来根据投诉向量的相似性对 HF 患者进行分组。然后,我们使用统计检验来确定每个聚类中显著富集的投诉。将 HF 人群划分为相似患者的分组,揭示了以 HF 表现相似为特征的可临床解释的亚组层次结构。重要的是,我们的方法揭示了 HF 的已知病因、风险因素和合并症(包括缺血性心脏病、主动脉瓣疾病、心房颤动、先天性心脏病、各种心肌病、肥胖症、高血压、糖尿病和慢性肾脏病),并深入了解了每个 HF 亚组 HF 临床表现的细节。我们的方法完全没有假设,因此可以很容易地应用于替代疾病或患者群体的新见解的发现。