Lambert Judith, Leutenegger Anne-Louise, Jannot Anne-Sophie, Baudot Anaïs
Sorbonne Université, Université Paris Cité, INSERM, Centre de Recherche des Cordeliers, F-75006 Paris, France; HeKA, Inria Paris, F-75015 Paris, France; Aix Marseille Univ, INSERM, MMG, UMR1251, Marseille, France.
Université Paris Cité, INSERM, NeuroDiderot, UMR1141, 75019 Paris, France.
J Biomed Inform. 2023 Mar;139:104309. doi: 10.1016/j.jbi.2023.104309. Epub 2023 Feb 14.
Identifying clusters (i.e., subgroups) of patients from the analysis of medico-administrative databases is particularly important to better understand disease heterogeneity. However, these databases contain different types of longitudinal variables which are measured over different follow-up periods, generating truncated data. It is therefore fundamental to develop clustering approaches that can handle this type of data.
We propose here cluster-tracking approaches to identify clusters of patients from truncated longitudinal data contained in medico-administrative databases.
We first cluster patients at each age. We then track the identified clusters over ages to construct cluster-trajectories. We compared our novel approaches with three classical longitudinal clustering approaches by calculating the silhouette score. As a use-case, we analyzed antithrombotic drugs used from 2008 to 2018 contained in the Échantillon Généraliste des Bénéficiaires (EGB), a French national cohort.
Our cluster-tracking approaches allow us to identify several cluster-trajectories with clinical significance without any imputation of data. The comparison of the silhouette scores obtained with the different approaches highlights the better performances of the cluster-tracking approaches.
The cluster-tracking approaches are a novel and efficient alternative to identify patient clusters from medico-administrative databases by taking into account their specificities.
通过对医疗管理数据库的分析来识别患者集群(即亚组)对于更好地理解疾病异质性尤为重要。然而,这些数据库包含不同类型的纵向变量,这些变量在不同的随访期进行测量,从而产生截断数据。因此,开发能够处理这类数据的聚类方法至关重要。
我们在此提出聚类跟踪方法,以从医疗管理数据库中包含的截断纵向数据中识别患者集群。
我们首先在每个年龄段对患者进行聚类。然后跟踪已识别的集群随年龄的变化情况,以构建聚类轨迹。我们通过计算轮廓系数将我们的新方法与三种经典的纵向聚类方法进行比较。作为一个应用案例,我们分析了法国全国队列“受益人群通用样本”(EGB)中2008年至2018年使用的抗血栓药物。
我们的聚类跟踪方法使我们能够在不进行任何数据插补的情况下识别出几条具有临床意义的聚类轨迹。通过不同方法获得的轮廓系数比较突出了聚类跟踪方法的更好性能。
聚类跟踪方法是一种新颖且有效的替代方法,通过考虑医疗管理数据库的特殊性来识别患者集群。