Escudié Jean-Baptiste, Rance Bastien, Malamut Georgia, Khater Sherine, Burgun Anita, Cellier Christophe, Jannot Anne-Sophie
Georges Pompidou European Hospital (HEGP), AP-HP, Paris, France.
INSERM UMRS 1138, Paris Descartes University, Paris, France.
BMC Med Inform Decis Mak. 2017 Sep 29;17(1):140. doi: 10.1186/s12911-017-0537-y.
Data collected in EHRs have been widely used to identifying specific conditions; however there is still a need for methods to define comorbidities and sources to identify comorbidities burden. We propose an approach to assess comorbidities burden for a specific disease using the literature and EHR data sources in the case of autoimmune diseases in celiac disease (CD).
We generated a restricted set of comorbidities using the literature (via the MeSH® co-occurrence file). We extracted the 15 most co-occurring autoimmune diseases of the CD. We used mappings of the comorbidities to EHR terminologies: ICD-10 (billing codes), ATC (drugs) and UMLS (clinical reports). Finally, we extracted the concepts from the different data sources. We evaluated our approach using the correlation between prevalence estimates in our cohort and co-occurrence ranking in the literature.
We retrieved the comorbidities for 741 patients with CD. 18.1% of patients had at least one of the 15 studied autoimmune disorders. Overall, 79.3% of the mapped concepts were detected only in text, 5.3% only in ICD codes and/or drugs prescriptions, and 15.4% could be found in both sources. Prevalence in our cohort were correlated with literature (Spearman's coefficient 0.789, p = 0.0005). The three most prevalent comorbidities were thyroiditis 12.6% (95% CI 10.1-14.9), type 1 diabetes 2.3% (95% CI 1.2-3.4) and dermatitis herpetiformis 2.0% (95% CI 1.0-3.0).
We introduced a process that leveraged the MeSH terminology to identify relevant autoimmune comorbidities of the CD and several data sources from EHRs to phenotype a large population of CD patients. We achieved prevalence estimates comparable to the literature.
电子健康记录(EHR)中收集的数据已被广泛用于识别特定疾病;然而,仍需要定义合并症的方法以及识别合并症负担的来源。我们提出了一种方法,在乳糜泻(CD)这一自身免疫性疾病的案例中,利用文献和EHR数据源来评估特定疾病的合并症负担。
我们通过文献(通过医学主题词表®共现文件)生成了一组受限的合并症。我们提取了与CD共现最多的15种自身免疫性疾病。我们使用了合并症与EHR术语的映射:国际疾病分类第10版(计费代码)、解剖学治疗学及化学分类系统(药物)和统一医学语言系统(临床报告)。最后,我们从不同数据源中提取了概念。我们使用队列中患病率估计值与文献中共现排名之间的相关性来评估我们的方法。
我们检索到了741例CD患者的合并症。18.1%的患者患有15种研究的自身免疫性疾病中的至少一种。总体而言,79.3%的映射概念仅在文本中检测到,5.3%仅在ICD代码和/或药物处方中检测到,15.4%在两个数据源中都能找到。我们队列中的患病率与文献相关(斯皮尔曼系数0.789,p = 0.0005)。三种最常见的合并症是甲状腺炎12.6%(95%置信区间10.1 - 14.9)、1型糖尿病2.3%(95%置信区间1.2 - 3.4)和疱疹样皮炎2.0%(95%置信区间1.0 - 3.0)。
我们引入了一个过程,该过程利用医学主题词表术语来识别CD相关的自身免疫性合并症,并利用来自EHR的多个数据源对大量CD患者进行表型分析。我们获得了与文献相当的患病率估计值。