Belloy Michael E, Le Guen Yann, Eger Sarah J, Napolioni Valerio, Greicius Michael D, He Zihuai
Department of Neurology and Neurological Sciences (M.E.B., Y.L.G., S.J.E., M.D.G., Z.H.), Stanford University, CA; Institut du Cerveau-Paris Brain Institute-ICM (Y.L.G.), France; School of Biosciences and Veterinary Medicine (V.N.), University of Camerino, Italy; and Quantitative Sciences Unit (Z.H.), Department of Medicine, Stanford University, CA.
Neurol Genet. 2022 Aug 11;8(5):e200012. doi: 10.1212/NXG.0000000000200012. eCollection 2022 Oct.
Exome sequencing (ES) and genome sequencing (GS) are expected to be critical to further elucidate the missing genetic heritability of Alzheimer disease (AD) risk by identifying rare coding and/or noncoding variants that contribute to AD pathogenesis. In the United States, the Alzheimer Disease Sequencing Project (ADSP) has taken a leading role in sequencing AD-related samples at scale, with the resultant data being made publicly available to researchers to generate new insights into the genetic etiology of AD. To achieve sufficient power, the ADSP has adapted a study design where subsets of larger AD cohorts are collected and sequenced across multiple centers, using a variety of sequencing platforms. This approach may lead to variable variant quality across sequencing centers and/or platforms. In this study, we sought to implement and evaluate filters that can be applied fast to robustly remove variant-level artifacts in the ADSP data.
We implemented a robust quality control procedure to handle ADSP data. We evaluated this procedure while performing exome-wide and genome-wide association analyses on AD risk using the latest ADSP whole ES (WES) and whole GS (WGS) data releases (NG00067.v5).
We observed that many variants displayed large variation in allele frequencies across sequencing centers/platforms and contributed to spurious association signals with AD risk. We also observed that sequencing platform/center adjustment in association models could not fully account for these spurious signals. To address this issue, we designed and implemented variant filters that could capture and remove these center-specific/platform-specific artifactual variants.
We derived a fast and robust approach to filter variants that represent sequencing center-related or platform-related artifacts underlying spurious associations with AD risk in ADSP WES and WGS data. This approach will be important to support future robust genetic association studies on ADSP data, as well as other studies with similar designs.
外显子组测序(ES)和基因组测序(GS)有望通过识别导致阿尔茨海默病(AD)发病机制的罕见编码和/或非编码变异,对于进一步阐明AD风险中缺失的遗传遗传性至关重要。在美国,阿尔茨海默病测序项目(ADSP)在大规模测序AD相关样本方面发挥了主导作用,所得数据已向研究人员公开,以产生对AD遗传病因的新见解。为了获得足够的效力,ADSP采用了一种研究设计,即在多个中心收集和测序更大AD队列的子集,使用多种测序平台。这种方法可能导致不同测序中心和/或平台之间的变异质量存在差异。在本研究中,我们试图实施并评估能够快速应用以稳健去除ADSP数据中变异水平假象的筛选方法。
我们实施了一个稳健的质量控制程序来处理ADSP数据。我们使用最新的ADSP全外显子组测序(WES)和全基因组测序(WGS)数据版本(NG00067.v5),在对AD风险进行外显子组范围和基因组范围的关联分析时评估了该程序。
我们观察到许多变异在不同测序中心/平台之间的等位基因频率显示出很大差异,并导致与AD风险的虚假关联信号。我们还观察到关联模型中的测序平台/中心调整不能完全解释这些虚假信号。为了解决这个问题,我们设计并实施了变异筛选方法,能够捕获并去除这些特定中心/特定平台的人为变异。
我们得出了一种快速且稳健的方法来筛选代表ADSP WES和WGS数据中与AD风险虚假关联背后的测序中心相关或平台相关假象的变异。这种方法对于支持未来对ADSP数据以及其他具有类似设计的研究进行稳健的遗传关联研究将很重要。