Department of Biostatistics, Mailman School of Public Health, Columbia University, 722 W168th Street, New York, 10032, USA.
Biostatistics. 2023 Dec 15;25(1):203-219. doi: 10.1093/biostatistics/kxac042.
Current diagnosis of neurological disorders often relies on late-stage clinical symptoms, which poses barriers to developing effective interventions at the premanifest stage. Recent research suggests that biomarkers and subtle changes in clinical markers may occur in a time-ordered fashion and can be used as indicators of early disease. In this article, we tackle the challenges to leverage multidomain markers to learn early disease progression of neurological disorders. We propose to integrate heterogeneous types of measures from multiple domains (e.g., discrete clinical symptoms, ordinal cognitive markers, continuous neuroimaging, and blood biomarkers) using a hierarchical Multilayer Exponential Family Factor (MEFF) model, where the observations follow exponential family distributions with lower-dimensional latent factors. The latent factors are decomposed into shared factors across multiple domains and domain-specific factors, where the shared factors provide robust information to perform extensive phenotyping and partition patients into clinically meaningful and biologically homogeneous subgroups. Domain-specific factors capture remaining unique variations for each domain. The MEFF model also captures nonlinear trajectory of disease progression and orders critical events of neurodegeneration measured by each marker. To overcome computational challenges, we fit our model by approximate inference techniques for large-scale data. We apply the developed method to Parkinson's Progression Markers Initiative data to integrate biological, clinical, and cognitive markers arising from heterogeneous distributions. The model learns lower-dimensional representations of Parkinson's disease (PD) and the temporal ordering of the neurodegeneration of PD.
目前,神经系统疾病的诊断通常依赖于晚期临床症状,这为在发病前阶段开发有效的干预措施带来了障碍。最近的研究表明,生物标志物和临床标志物的细微变化可能会按时间顺序发生,并可用作早期疾病的指标。在本文中,我们探讨了利用多领域标志物来了解神经系统疾病早期发病进程的挑战。我们提出了一种使用分层多层指数家族因子 (MEFF) 模型整合来自多个领域的异构类型度量的方法(例如离散临床症状、序数量化认知标志物、连续神经影像学和血液生物标志物),其中观测结果遵循具有低维潜在因子的指数家族分布。潜在因子分解为多个域之间的共享因子和特定于域的因子,其中共享因子提供了强大的信息来进行广泛的表型分析,并将患者分为具有临床意义和生物学同质性的亚组。特定于域的因子捕获每个域的剩余独特变化。MEFF 模型还可以捕获疾病进展的非线性轨迹,并对每个标志物测量的神经退行性变的关键事件进行排序。为了克服计算挑战,我们通过近似推断技术来拟合我们的模型,以处理大规模数据。我们将开发的方法应用于帕金森进展标志物倡议数据,以整合来自异构分布的生物学、临床和认知标志物。该模型学习了帕金森病 (PD) 的低维表示形式以及 PD 神经退行性变的时间顺序。