Mota Marco Barbero, Still John M, Gamboa Jorge L, Strobl Eric V, Stein Charles M, Kawai Vivian K, Lasko Thomas A
Vanderbilt University Medical Center, Department of Biomedical Informatics.
Vanderbilt DBMI PhD program, 2525 West End Ave, Nashville, TN.
AMIA Annu Symp Proc. 2025 May 22;2024:172-181. eCollection 2024.
Systemic lupus erythematosus (SLE) is a complex heterogeneous disease with many manifestational facets. We propose a data-driven approach to discover probabilistic independent sources from multimodal imperfect EHR data. These sources represent exogenous variables in the data generation process causal graph that estimate latent root causes of the presence of SLE in the health record. We objectively evaluated the sources against the original variables from which they were discovered by training supervised models to discriminate SLE from negative health records using a reduced set of labelled instances. We found 19 predictive sources with high clinical validity and whose EHR signatures define independent factors of SLE heterogeneity. Using the sources as input patient data representation enables models to provide with rich explanations that better capture the clinical reasons why a particular record is (not) an SLE case. Providers may be willing to trade patient-level interpretability for discrimination especially in challenging cases.
系统性红斑狼疮(SLE)是一种具有多种表现形式的复杂异质性疾病。我们提出了一种数据驱动的方法,从多模态不完美电子健康记录(EHR)数据中发现概率独立源。这些源代表数据生成过程因果图中的外生变量,用于估计健康记录中SLE存在的潜在根本原因。我们通过训练监督模型,使用一组减少的标记实例将SLE与阴性健康记录区分开来,从而针对发现这些源的原始变量对其进行了客观评估。我们发现了19个具有高临床有效性的预测源,其EHR特征定义了SLE异质性的独立因素。将这些源用作输入患者数据表示,使模型能够提供丰富的解释,更好地捕捉特定记录为何是(或不是)SLE病例的临床原因。尤其是在具有挑战性的病例中,医疗服务提供者可能愿意以患者层面的可解释性换取判别能力。