Jeong Seokho, Shivakumar Manu, Jung Sang-Hyuk, Won Hong-Hee, Nho Kwangsik, Huang Heng, Davatzikos Christos, Saykin Andrew J, Thompson Paul M, Shen Li, Kim Young Jin, Kim Bong-Jo, Lee Seunggeun, Kim Dokyoon
Graduate School of Data Science, Seoul National University, Seoul, Republic of Korea.
Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
Alzheimers Dement. 2025 Apr;21(4):e70109. doi: 10.1002/alz.70109.
Numerous studies on Alzheimer's disease polygenic risk scores (PRSs) overlook sample overlap between International Genomics of Alzheimer's Project (IGAP) and target datasets like Alzheimer's Disease Neuroimaging Initiative (ADNI).
To address this, we developed overlap-adjusted PRS (OA PRS) and tested it on simulated data to assess biases from different scenarios by varying training, testing, and overlap proportions. OA PRS was used to adjust for sample bias in simulations; then, we applied OA PRS to IGAP and ADNI datasets and validated through visual diagnosis.
OA PRS effectively adjusted for sample overlap in all simulation scenarios, as well as for IGAP and ADNI. The original IGAP PRS showed an inflated area under the receiver operating characteristic (AUROC: 0.915) on overlapping samples. OA PRS reduced the AUROC to 0.726, closely aligning with the AUROC of non-overlapping samples (0.712). Further, visual diagnostics confirmed the effectiveness of our adjustments.
With OA PRS, we were able to adjust the IGAP summary-based PRS for the overlapped ADNI samples, allowing the dataset to be fully used without the risk of overfitting.
Sample overlap between large Alzheimer's disease (AD) cohorts poses overfitting bias when using AD polygenic risk scores (PRSs). This study highlighted the effectiveness of overlap-adjusted PRS (OA -PRS) in mitigating overfitting and improving the accuracy of PRS estimations. New PRSs based on adjusted effect sizes showed increased power in association with clinical features.
众多关于阿尔茨海默病多基因风险评分(PRSs)的研究忽略了国际阿尔茨海默病基因组计划(IGAP)与阿尔茨海默病神经影像倡议(ADNI)等目标数据集之间的样本重叠问题。
为解决这一问题,我们开发了重叠调整后的PRS(OA PRS),并在模拟数据上对其进行测试,通过改变训练、测试和重叠比例来评估不同场景下的偏差。OA PRS用于在模拟中调整样本偏差;然后,我们将OA PRS应用于IGAP和ADNI数据集,并通过视觉诊断进行验证。
OA PRS在所有模拟场景中有效地调整了样本重叠,在IGAP和ADNI数据集中同样有效。原始的IGAP PRS在重叠样本上显示出受试者操作特征曲线下面积膨胀(AUROC:0.915)。OA PRS将AUROC降低至0.726,与非重叠样本的AUROC(0.712)紧密对齐。此外,视觉诊断证实了我们调整的有效性。
借助OA PRS,我们能够针对重叠的ADNI样本调整基于IGAP汇总的PRS,使数据集得以充分利用而无过度拟合风险。
在使用阿尔茨海默病(AD)多基因风险评分(PRSs)时,大型AD队列之间的样本重叠会造成过度拟合偏差。本研究强调了重叠调整后的PRS(OA -PRS)在减轻过度拟合和提高PRS估计准确性方面的有效性。基于调整效应大小的新PRSs在与临床特征的关联中显示出更强的效力。