Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, China.
Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.
Biometrics. 2023 Sep;79(3):2677-2690. doi: 10.1111/biom.13734. Epub 2022 Sep 1.
Alzheimer's disease (AD) is a progressive and polygenic disorder that affects millions of individuals each year. Given that there have been few effective treatments yet for AD, it is highly desirable to develop an accurate model to predict the full disease progression profile based on an individual's genetic characteristics for early prevention and clinical management. This work uses data composed of all four phases of the Alzheimer's Disease Neuroimaging Initiative (ADNI) study, including 1740 individuals with 8 million genetic variants. We tackle several challenges in this data, characterized by large-scale genetic data, interval-censored outcome due to intermittent assessments, and left truncation in one study phase (ADNIGO). Specifically, we first develop a semiparametric transformation model on interval-censored and left-truncated data and estimate parameters through a sieve approach. Then we propose a computationally efficient generalized score test to identify variants associated with AD progression. Next, we implement a novel neural network on interval-censored data (NN-IC) to construct a prediction model using top variants identified from the genome-wide test. Comprehensive simulation studies show that the NN-IC outperforms several existing methods in terms of prediction accuracy. Finally, we apply the NN-IC to the full ADNI data and successfully identify subgroups with differential progression risk profiles. Data used in the preparation of this article were obtained from the ADNI database.
阿尔茨海默病(AD)是一种进行性和多基因疾病,每年影响数以百万计的人。鉴于目前针对 AD 还没有有效的治疗方法,因此非常需要开发一种准确的模型,根据个体的遗传特征预测疾病的全病程进展情况,以便进行早期预防和临床管理。本研究使用了包含阿尔茨海默病神经影像学倡议(ADNI)研究所有四个阶段的数据,包括 1740 名个体的 800 万个遗传变异。我们针对该数据中的几个挑战进行了研究,这些挑战的特点是遗传数据规模大、由于间歇性评估导致结果存在区间截断、以及在一个研究阶段(ADNIGO)存在左截断。具体来说,我们首先在区间截断和左截断数据上开发了一种半参数转换模型,并通过筛法估计参数。然后,我们提出了一种计算效率高的广义得分检验,以识别与 AD 进展相关的变异。接下来,我们在区间截断数据上实现了一种新的神经网络(NN-IC),使用全基因组检验中确定的顶级变异构建预测模型。全面的模拟研究表明,NN-IC 在预测准确性方面优于几种现有方法。最后,我们将 NN-IC 应用于完整的 ADNI 数据,并成功识别出具有不同进展风险特征的亚组。本文中使用的数据取自 ADNI 数据库。