Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
J Neurodev Disord. 2024 Sep 12;16(1):54. doi: 10.1186/s11689-024-09571-8.
BACKGROUND: Common genetic variation has been shown to account for a large proportion of ASD heritability. Polygenic scores generated for autism spectrum disorder (ASD-PGS) using the most recent discovery data, however, explain less variance than expected, despite reporting significant associations with ASD and other ASD-related traits. Here, we investigate the extent to which information loss on the target study genome-wide microarray weakens the predictive power of the ASD-PGS. METHODS: We studied genotype data from three cohorts of individuals with high familial liability for ASD: The Early Autism Risk Longitudinal Investigation (EARLI), Markers of Autism Risk in Babies-Learning Early Signs (MARBLES), and the Infant Brain Imaging Study (IBIS), and one population-based sample, Study to Explore Early Development Phase I (SEED I). Individuals were genotyped on different microarrays ranging from 1 to 5 million sites. Coverage of the top 88 genome-wide suggestive variants implicated in the discovery was evaluated in all four studies before quality control (QC), after QC, and after imputation. We then created a novel method to assess coverage on the resulting ASD-PGS by correlating a PGS informed by a comprehensive list of variants to a PGS informed with only the available variants. RESULTS: Prior to imputations, None of the four cohorts directly or indirectly covered all 88 variants among the measured genotype data. After imputation, the two cohorts genotyped on 5-million arrays reached full coverage. Analysis of our novel metric showed generally high genome-wide coverage across all four studies, but a greater number of SNPs informing the ASD-PGS did not result in improved coverage according to our metric. LIMITATIONS: The studies we analyzed contained modest sample sizes. Our analyses included microarrays with more than 1-million sites, so smaller arrays such as Global Diversity and the PsychArray were not included. Our PGS metric for ASD is only generalizable to samples of European ancestries, though the coverage metric can be computed for traits that have sufficiently large-sized discovery findings in other ancestries. CONCLUSIONS: We show that commonly used genotyping microarrays have incomplete coverage for common ASD variants, and imputation cannot always recover lost information. Our novel metric provides an intuitive approach to reporting information loss in PGS and an alternative to reporting the total number of SNPs included in the PGS. While applied only to ASD here, this metric can easily be used with other traits.
背景:已证实常见遗传变异可在很大程度上解释 ASD 的遗传率。然而,使用最新发现的数据生成的自闭症谱系障碍多基因评分 (ASD-PGS) 解释的方差比预期的要小,尽管其与 ASD 及其他与 ASD 相关的特征存在显著关联。在此,我们研究了目标研究全基因组微阵列上的信息丢失在多大程度上削弱了 ASD-PGS 的预测能力。
方法:我们研究了三个具有高 ASD 家族易感性的个体队列的基因型数据:早期自闭症风险纵向研究 (EARLI)、婴儿自闭症风险标志物-学习早期迹象 (MARBLES) 和婴儿大脑成像研究 (IBIS),以及一个基于人群的样本——探索早期发育阶段 I 研究 (SEED I)。个体在从 100 万到 500 万个位点不等的不同微阵列上进行了基因分型。在进行质量控制 (QC) 之前、之后以及在进行 imputation 之后,我们评估了在四项研究中对发现中涉及的前 88 个全基因组提示性变异的最高覆盖率。然后,我们创建了一种新方法,通过将由综合变异列表提供的 PGS 与仅由可用变异提供的 PGS 进行相关,来评估对生成的 ASD-PGS 的覆盖率。
结果:在 imputation 之前,四个队列中没有一个直接或间接覆盖了测量的基因型数据中所有 88 个变异。在 imputation 之后,两个使用 500 万微阵列的队列达到了完全覆盖。根据我们的新指标对全基因组的分析显示,四个研究均具有较高的覆盖率,但根据我们的指标,提供 ASD-PGS 的 SNP 数量的增加并未导致覆盖率的提高。
局限性:我们分析的研究包含的样本量适中。我们的分析包括了超过 100 万个位点的微阵列,因此没有包括较小的阵列,如 Global Diversity 和 PsychArray。我们的 ASD-PGS 指标仅适用于欧洲血统的样本,尽管对于在其他血统中具有足够大的发现发现的特征,可以计算覆盖度指标。
结论:我们表明,常用的基因分型微阵列对常见的 ASD 变体的覆盖度不完整,并且 imputation 并不总是能恢复丢失的信息。我们的新指标为报告 PGS 中的信息丢失提供了一种直观的方法,并且是报告 PGS 中包含的 SNP 总数的替代方法。虽然此处仅应用于 ASD,但该指标可轻松用于其他特征。
Psychopharmacol Bull. 2024-7-8
Cochrane Database Syst Rev. 2022-5-20
Cochrane Database Syst Rev. 2017-11-21
Cochrane Database Syst Rev. 2023-10-9
Cochrane Database Syst Rev. 2022-8-25
Cochrane Database Syst Rev. 2021-9-3
Biol Psychiatry. 2021-11-1
Bioinformatics. 2021-4-1
Genome Med. 2020-5-18