Qiu Jiajun, Hu Yao, Li Li, Erzurumluoglu Abdullah Mesut, Braenne Ingrid, Whitehurst Charles, Schmitz Jochen, Arora Jatin, Bartholdy Boris Alexander, Gandhi Shrey, Khoueiry Pierre, Mueller Stefanie, Noyvert Boris, Ding Zhihao, Jensen Jan Nygaard, de Jong Johann
Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany.
Immunology & Respiratory Diseases, Boehringer-Ingelheim, Ridgefield, CT, USA.
Nat Commun. 2025 Mar 14;16(1):2534. doi: 10.1038/s41467-025-56625-z.
Precision medicine requires accurate identification of clinically relevant patient subgroups. Electronic health records provide major opportunities for leveraging machine learning approaches to uncover novel patient subgroups. However, many existing approaches fail to adequately capture complex interactions between diagnosis trajectories and disease-relevant risk events, leading to subgroups that can still display great heterogeneity in event risk and underlying molecular mechanisms. To address this challenge, we implemented VaDeSC-EHR, a transformer-based variational autoencoder for clustering longitudinal survival data as extracted from electronic health records. We show that VaDeSC-EHR outperforms baseline methods on both synthetic and real-world benchmark datasets with known ground-truth cluster labels. In an application to Crohn's disease, VaDeSC-EHR successfully identifies four distinct subgroups with divergent diagnosis trajectories and risk profiles, revealing clinically and genetically relevant factors in Crohn's disease. Our results show that VaDeSC-EHR can be a powerful tool for discovering novel patient subgroups in the development of precision medicine approaches.
精准医学需要准确识别临床上相关的患者亚组。电子健康记录为利用机器学习方法发现新的患者亚组提供了重大机遇。然而,许多现有方法未能充分捕捉诊断轨迹与疾病相关风险事件之间的复杂相互作用,导致亚组在事件风险和潜在分子机制方面仍表现出很大的异质性。为应对这一挑战,我们实施了VaDeSC-EHR,这是一种基于Transformer的变分自编码器,用于对从电子健康记录中提取的纵向生存数据进行聚类。我们表明,在具有已知真实聚类标签的合成和真实世界基准数据集上,VaDeSC-EHR均优于基线方法。在克罗恩病的应用中,VaDeSC-EHR成功识别出四个具有不同诊断轨迹和风险特征的不同亚组,揭示了克罗恩病中临床和遗传相关因素。我们的结果表明,VaDeSC-EHR可以成为在精准医学方法开发中发现新的患者亚组的有力工具。