Li Ting, Yu Yang, Marron J S, Zhu Hongtu
School of Statistics and Management, Shanghai University of Finance and Economics.
Department of Statistics and Operations Research, University of North Carolina at Chapel Hill.
Ann Appl Stat. 2024 Mar;18(1):704-728. doi: 10.1214/23-aoas1808. Epub 2024 Jan 31.
This paper is motivated by the joint analysis of genetic, imaging, and clinical (GIC) data collected in the Alzheimer's Disease Neuroimaging Initiative (ADNI) study. We propose a partially functional linear regression (PFLR) framework to map high-dimensional GIC-related pathways for Alzheimer's disease (AD). We develop a joint model selection and estimation procedure by embedding imaging data in the reproducing kernel Hilbert space and imposing the penalty for the coefficients of genetic variables. We apply the proposed method to the ADNI dataset to identify important features from tens of thousands of genetic polymorphisms (reduced from millions using a preprocessing step) and study the effects of a certain set of informative genetic variants and the baseline hippocampus surface on 13 future cognitive scores. We also explore the shared and distinct heritability patterns of these cognitive scores. Analysis results suggest that both the hippocampal and genetic data have heterogeneous effects on different scores, with the trend that the value of both hippocampi are negatively associated with the severity of cognition deficits. Polygenic effects are observed for all the thirteen cognitive scores. The well-known APOE4 genotype only explains a small part of the cognitive function. Shared genetic etiology exists; however, greater genetic heterogeneity exists within disease classifications after accounting for the baseline diagnosis status. These analyses are useful in further investigation of functional mechanisms for AD progression.
本文的动机源于对阿尔茨海默病神经影像倡议(ADNI)研究中收集的基因、影像和临床(GIC)数据的联合分析。我们提出了一个部分功能线性回归(PFLR)框架,用于绘制与阿尔茨海默病(AD)相关的高维GIC通路。我们通过将影像数据嵌入再生核希尔伯特空间并对基因变量的系数施加惩罚,开发了一种联合模型选择和估计程序。我们将所提出的方法应用于ADNI数据集,以从数以万计的基因多态性中识别重要特征(使用预处理步骤从数百万个中减少),并研究一组特定的信息性基因变异和基线海马体表型对13个未来认知评分的影响。我们还探索了这些认知评分的共享和独特遗传模式。分析结果表明,海马体和基因数据对不同评分都有不同的影响,趋势是双侧海马体的值与认知缺陷的严重程度呈负相关。在所有13个认知评分中都观察到了多基因效应。著名的APOE4基因型仅解释了认知功能的一小部分。存在共享的遗传病因;然而,在考虑基线诊断状态后,疾病分类中存在更大的遗传异质性。这些分析有助于进一步研究AD进展的功能机制。