Department of Medical and Surgical Sciences and Biotechnologies, "Sapienza" University of Rome, 04100 Latina, Italy.
Department of Brain and Behavioral Sciences, University of Pavia, 27100 Pavia, Italy.
Sensors (Basel). 2024 Jun 3;24(11):3613. doi: 10.3390/s24113613.
The interpretability of gait analysis studies in people with rare diseases, such as those with primary hereditary cerebellar ataxia (pwCA), is frequently limited by the small sample sizes and unbalanced datasets. The purpose of this study was to assess the effectiveness of data balancing and generative artificial intelligence (AI) algorithms in generating synthetic data reflecting the actual gait abnormalities of pwCA. Gait data of 30 pwCA (age: 51.6 ± 12.2 years; 13 females, 17 males) and 100 healthy subjects (age: 57.1 ± 10.4; 60 females, 40 males) were collected at the lumbar level with an inertial measurement unit. Subsampling, oversampling, synthetic minority oversampling, generative adversarial networks, and conditional tabular generative adversarial networks (ctGAN) were applied to generate datasets to be input to a random forest classifier. Consistency and explainability metrics were also calculated to assess the coherence of the generated dataset with known gait abnormalities of pwCA. ctGAN significantly improved the classification performance compared with the original dataset and traditional data augmentation methods. ctGAN are effective methods for balancing tabular datasets from populations with rare diseases, owing to their ability to improve diagnostic models with consistent explainability.
在罕见病患者(如原发性遗传性小脑共济失调患者,pwCA)的步态分析研究中,解释性通常受到小样本量和不平衡数据集的限制。本研究旨在评估数据平衡和生成式人工智能(AI)算法在生成反映 pwCA 实际步态异常的合成数据方面的有效性。使用惯性测量单元在腰椎水平收集了 30 名 pwCA(年龄:51.6 ± 12.2 岁;女性 13 名,男性 17 名)和 100 名健康受试者(年龄:57.1 ± 10.4 岁;女性 60 名,男性 40 名)的步态数据。应用了子采样、过采样、合成少数过采样、生成式对抗网络和条件表格生成式对抗网络(ctGAN)来生成数据集,输入到随机森林分类器中。还计算了一致性和可解释性指标,以评估生成数据集与 pwCA 已知步态异常的一致性。ctGAN 与原始数据集和传统数据增强方法相比,显著提高了分类性能。ctGAN 是一种有效的平衡罕见病人群表型数据集的方法,因为它们能够通过具有一致可解释性的诊断模型来提高性能。