The Hospital for Sick Children and University of Toronto, Toronto, Ontario, Canada.
Arthritis Rheumatol. 2014 Dec;66(12):3463-75. doi: 10.1002/art.38875.
Childhood arthritis encompasses a heterogeneous family of diseases. Significant variation in clinical presentation remains despite consensus-driven diagnostic classifications. Developments in data analysis provide powerful tools for interrogating large heterogeneous data sets. We report a novel approach to integrating biologic and clinical data toward a new classification for childhood arthritis, using computational biology for data-driven pattern recognition.
Probabilistic principal components analysis was used to transform a large set of data into 4 interpretable indicators or composite variables on which patients were grouped by cluster analysis. Sensitivity analysis was conducted to determine key variables in determining indicators and cluster assignment. Results were validated against an independent validation cohort.
Meaningful biologic and clinical characteristics, including levels of proinflammatory cytokines and measures of disease activity, defined axes/indicators that identified homogeneous patient subgroups by cluster analysis. The new patient classifications resolved major differences between patient subpopulations better than International League of Associations for Rheumatology subtypes. Fourteen variables were identified by sensitivity analysis to crucially determine indicators and clusters. This new schema was conserved in an independent validation cohort.
Data-driven unsupervised machine learning is a powerful approach for interrogating clinical and biologic data toward disease classification, providing insight into the biology underlying clinical heterogeneity in childhood arthritis. Our analytical framework enabled the recovery of unique patterns from small cohorts and addresses a major challenge, patient numbers, in studying rare diseases.
儿童关节炎包含一组异质性疾病。尽管有共识驱动的诊断分类,但临床表现仍存在显著差异。数据分析的发展为研究大型异质数据集提供了强大的工具。我们报告了一种新的方法,即通过计算生物学进行数据驱动的模式识别,将生物学和临床数据整合到儿童关节炎的新分类中。
使用概率主成分分析将大量数据转换为 4 个可解释的指标或综合变量,然后通过聚类分析对患者进行分组。进行敏感性分析以确定确定指标和聚类分配的关键变量。结果与独立验证队列进行了验证。
有意义的生物学和临床特征,包括促炎细胞因子水平和疾病活动度测量,确定了通过聚类分析确定同质患者亚组的轴/指标。新的患者分类比国际风湿病联盟(International League of Associations for Rheumatology)的亚型更好地解决了患者亚群之间的主要差异。通过敏感性分析确定了 14 个变量来关键确定指标和聚类。这一新方案在独立验证队列中得到了保留。
数据驱动的无监督机器学习是一种强大的方法,可以研究疾病分类中的临床和生物学数据,深入了解儿童关节炎临床异质性的生物学基础。我们的分析框架能够从小队列中恢复独特的模式,并解决了研究罕见疾病时面临的一个主要挑战,即患者数量。