Genolini Christophe, Ecochard René, Benghezal Mamoun, Driss Tarak, Andrieu Sandrine, Subtil Fabien
Inserm UMR 1027, University of Toulouse III, Toulouse, France.
CeRSM (EA 2931), UFR STAPS, University Paris Ouest-Nanterre-La Défense, Nanterre, France.
PLoS One. 2016 Jun 3;11(6):e0150738. doi: 10.1371/journal.pone.0150738. eCollection 2016.
Longitudinal data are data in which each variable is measured repeatedly over time. One possibility for the analysis of such data is to cluster them. The majority of clustering methods group together individual that have close trajectories at given time points. These methods group trajectories that are locally close but not necessarily those that have similar shapes. However, in several circumstances, the progress of a phenomenon may be more important than the moment at which it occurs. One would thus like to achieve a partitioning where each group gathers individuals whose trajectories have similar shapes whatever the time lag between them.
In this article, we present a longitudinal data partitioning algorithm based on the shapes of the trajectories rather than on classical distances. Because this algorithm is time consuming, we propose as well two data simplification procedures that make it applicable to high dimensional datasets.
In an application to Alzheimer disease, this algorithm revealed a "rapid decline" patient group that was not found by the classical methods. In another application to the feminine menstrual cycle, the algorithm showed, contrarily to the current literature, that the luteinizing hormone presents two peaks in an important proportion of women (22%).
纵向数据是指每个变量随时间重复测量得到的数据。分析此类数据的一种可能性是对其进行聚类。大多数聚类方法会将在给定时间点具有相近轨迹的个体归为一组。这些方法将局部相近的轨迹归为一组,但不一定是那些形状相似的轨迹。然而,在某些情况下,一种现象的进展可能比它发生的时刻更重要。因此,人们希望实现一种划分,使得每个组聚集轨迹形状相似的个体,无论它们之间的时间间隔如何。
在本文中,我们提出了一种基于轨迹形状而非经典距离的纵向数据划分算法。由于该算法耗时,我们还提出了两种数据简化程序,使其适用于高维数据集。
在阿尔茨海默病的应用中,该算法揭示了一个经典方法未发现的“快速衰退”患者组。在另一个关于女性月经周期的应用中,与当前文献相反,该算法表明,在相当比例的女性(22%)中,促黄体生成素会出现两个峰值。