全基因组测序研究中稀有变异关联区域的动态扫描程序。

Dynamic Scan Procedure for Detecting Rare-Variant Association Regions in Whole-Genome Sequencing Studies.

机构信息

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.

Department of Population Health Sciences, University of Utah, Salt Lake City, UT 84108, USA.

出版信息

Am J Hum Genet. 2019 May 2;104(5):802-814. doi: 10.1016/j.ajhg.2019.03.002. Epub 2019 Apr 12.

DOI:10.1016/j.ajhg.2019.03.002

PMID:30982610

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6507043/

Abstract

Whole-genome sequencing (WGS) studies are being widely conducted in order to identify rare variants associated with human diseases and disease-related traits. Classical single-marker association analyses for rare variants have limited power, and variant-set-based analyses are commonly used by researchers for analyzing rare variants. However, existing variant-set-based approaches need to pre-specify genetic regions for analysis; hence, they are not directly applicable to WGS data because of the large number of intergenic and intron regions that consist of a massive number of non-coding variants. The commonly used sliding-window method requires the pre-specification of fixed window sizes, which are often unknown as a priori, are difficult to specify in practice, and are subject to limitations given that the sizes of genetic-association regions are likely to vary across the genome and phenotypes. We propose a computationally efficient and dynamic scan-statistic method (Scan the Genome [SCANG]) for analyzing WGS data; this method flexibly detects the sizes and the locations of rare-variant association regions without the need to specify a prior, fixed window size. The proposed method controls for the genome-wise type I error rate and accounts for the linkage disequilibrium among genetic variants. It allows the detected sizes of rare-variant association regions to vary across the genome. Through extensive simulated studies that consider a wide variety of scenarios, we show that SCANG substantially outperforms several alternative methods for detecting rare-variant-associations while controlling for the genome-wise type I error rates. We illustrate SCANG by analyzing the WGS lipids data from the Atherosclerosis Risk in Communities (ARIC) study.

摘要

全基因组测序（WGS）研究正在广泛进行，以鉴定与人类疾病和疾病相关特征相关的罕见变异。罕见变异的经典单标记关联分析的功效有限，研究人员通常使用基于变异集的分析方法来分析罕见变异。然而，现有的基于变异集的方法需要预先指定用于分析的遗传区域；因此，由于包含大量非编码变异的基因间和内含子区域数量众多，它们不能直接应用于 WGS 数据。常用的滑动窗口方法需要预先指定固定的窗口大小，但这些窗口大小通常是未知的，在实践中很难指定，并且受到限制，因为遗传关联区域的大小可能因基因组和表型而异。我们提出了一种计算高效且动态的扫描统计方法（扫描基因组[SCANG]）来分析 WGS 数据；该方法灵活地检测罕见变异关联区域的大小和位置，而无需预先指定固定的窗口大小。所提出的方法控制全基因组的 I 型错误率，并考虑遗传变异之间的连锁不平衡。它允许检测到的罕见变异关联区域的大小在整个基因组中变化。通过考虑各种情况的广泛模拟研究，我们表明，SCANG 在控制全基因组 I 型错误率的同时，大大优于几种用于检测罕见变异关联的替代方法。我们通过分析动脉粥样硬化风险社区（ARIC）研究中的 WGS 脂质数据来展示 SCANG。

相似文献

Dynamic Scan Procedure for Detecting Rare-Variant Association Regions in Whole-Genome Sequencing Studies.全基因组测序研究中稀有变异关联区域的动态扫描程序。

Am J Hum Genet. 2019 May 2;104(5):802-814. doi: 10.1016/j.ajhg.2019.03.002. Epub 2019 Apr 12.

Simultaneous Detection of Signal Regions Using Quadratic Scan Statistics With Applications to Whole Genome Association Studies.使用二次扫描统计量同时检测信号区域及其在全基因组关联研究中的应用

J Am Stat Assoc. 2022;117(538):823-834. doi: 10.1080/01621459.2020.1822849. Epub 2020 Nov 12.

A statistical framework for multi-trait rare variant analysis in large-scale whole-genome sequencing studies.大规模全基因组测序研究中多性状罕见变异分析的统计框架。

Nat Comput Sci. 2025 Feb;5(2):125-143. doi: 10.1038/s43588-024-00764-8. Epub 2025 Feb 7.

A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies.一种用于检测大规模全基因组测序研究中非编码稀有变异关联的框架。

Nat Methods. 2022 Dec;19(12):1599-1611. doi: 10.1038/s41592-022-01640-x. Epub 2022 Oct 27.

eSCAN: scan regulatory regions for aggregate association testing using whole-genome sequencing data.eSCAN：使用全基因组测序数据扫描调控区域进行聚合关联测试。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab497.

Identification of putative causal loci in whole-genome sequencing data via knockoff statistics.基于置换统计量的全基因组测序数据中假定因果基因座的识别。

Nat Commun. 2021 May 25;12(1):3152. doi: 10.1038/s41467-021-22889-4.

On the association analysis of genome-sequencing data: A spatial clustering approach for partitioning the entire genome into nonoverlapping windows.关于基因组测序数据的关联分析：一种将整个基因组划分为非重叠窗口的空间聚类方法。

Genet Epidemiol. 2017 May;41(4):332-340. doi: 10.1002/gepi.22040. Epub 2017 Mar 20.

Quantitative phenotype scan statistic (QPSS) reveals rare variant associations with Alzheimer's disease endophenotypes.定量表型扫描统计（QPSS）揭示了与阿尔茨海默病内表型相关的罕见变异。

BMC Med Genet. 2020 May 15;21(1):106. doi: 10.1186/s12881-020-01046-6.

Rare variants in long non-coding RNAs are associated with blood lipid levels in the TOPMed whole-genome sequencing study.在 TOPMed 全基因组测序研究中，长非编码 RNA 中的罕见变异与血脂水平相关。

Am J Hum Genet. 2023 Oct 5;110(10):1704-1717. doi: 10.1016/j.ajhg.2023.09.003.

Enhancing the power to detect low-frequency variants in genome-wide screens.增强全基因组筛查中检测低频变异的能力。

Genetics. 2014 Apr;196(4):1293-302. doi: 10.1534/genetics.113.160739. Epub 2014 Feb 4.

引用本文的文献

Leveraging functional annotations to map rare variants associated with Alzheimer disease with gruyere.利用功能注释通过格鲁耶尔奶酪法来映射与阿尔茨海默病相关的罕见变异。（注：“gruyere”可能是特定方法名称，直接音译为“格鲁耶尔”，具体含义可能需结合专业背景理解）

Am J Hum Genet. 2025 Aug 13. doi: 10.1016/j.ajhg.2025.07.016.

Noncoding rare variant associations with blood traits in 166,740 UK Biobank genomes.166740例英国生物银行基因组中与血液性状相关的非编码罕见变异

Nat Genet. 2025 Aug 6. doi: 10.1038/s41588-025-02288-x.

A statistical framework for multi-trait rare variant analysis in large-scale whole-genome sequencing studies.大规模全基因组测序研究中多性状罕见变异分析的统计框架。

Nat Comput Sci. 2025 Feb;5(2):125-143. doi: 10.1038/s43588-024-00764-8. Epub 2025 Feb 7.

Assessment of the functionality and usability of open-source rare variant analysis pipelines.开源罕见变异分析流程的功能与可用性评估。

Brief Bioinform. 2025 Feb 5;26(1). doi: 10.1093/bib/bbaf044.

RetroFun-RVS: A Retrospective Family-Based Framework for Rare Variant Analysis Incorporating Functional Annotations.RetroFun-RVS：一种基于回顾性家系的罕见变异分析框架，纳入了功能注释。

Genet Epidemiol. 2025 Mar;49(2):e70001. doi: 10.1002/gepi.70001.

Leveraging functional annotations to map rare variants associated with Alzheimer's disease with gruyere.利用功能注释通过格鲁耶尔奶酪法来映射与阿尔茨海默病相关的罕见变异。（注：这里“gruyere”在医学语境中可能并不是常见词汇，可能是特定方法名称，按照要求直接翻译）

medRxiv. 2025 Mar 4:2024.12.06.24318577. doi: 10.1101/2024.12.06.24318577.

Whole-genome sequencing reveals rare variants associated with gout in Taiwanese males.全基因组测序揭示了与台湾男性痛风相关的罕见变异。

Front Genet. 2024 Sep 25;15:1423714. doi: 10.3389/fgene.2024.1423714. eCollection 2024.

Whole-genome sequencing in 333,100 individuals reveals rare non-coding single variant and aggregate associations with height.对 333100 个人进行全基因组测序，揭示了罕见的非编码单变体和与身高相关的综合关联。

Nat Commun. 2024 Oct 3;15(1):8549. doi: 10.1038/s41467-024-52579-w.

The Genetic Determinants and Genomic Consequences of Non-Leukemogenic Somatic Point Mutations.非致白血病性体细胞点突变的遗传决定因素及基因组后果

medRxiv. 2024 Aug 26:2024.08.22.24312319. doi: 10.1101/2024.08.22.24312319.

Enhancing prediction accuracy of coronary artery disease through machine learning-driven genomic variant selection.通过机器学习驱动的基因组变异选择提高冠状动脉疾病预测准确性。

J Transl Med. 2024 Apr 16;22(1):356. doi: 10.1186/s12967-024-05090-1.

本文引用的文献

Cauchy combination test: a powerful test with analytic -value calculation under arbitrary dependency structures.柯西组合检验：一种在任意相依结构下具有解析值计算功能的强大检验。

J Am Stat Assoc. 2020;115(529):393-402. doi: 10.1080/01621459.2018.1554485. Epub 2019 Apr 25.

ACAT: A Fast and Powerful p Value Combination Method for Rare-Variant Analysis in Sequencing Studies.ACAT：一种用于测序研究中罕见变异分析的快速而强大的 p 值组合方法。

Am J Hum Genet. 2019 Mar 7;104(3):410-421. doi: 10.1016/j.ajhg.2019.01.002.

Efficient Variant Set Mixed Model Association Tests for Continuous and Binary Traits in Large-Scale Whole-Genome Sequencing Studies.高效的变体集混合模型关联测试在全基因组测序研究中用于连续和二项性状。

Am J Hum Genet. 2019 Feb 7;104(2):260-274. doi: 10.1016/j.ajhg.2018.12.012. Epub 2019 Jan 10.

Deep-coverage whole genome sequences and blood lipids among 16,324 individuals.在 16324 个人中进行深度覆盖全基因组序列和血脂检测。

Nat Commun. 2018 Aug 23;9(1):3391. doi: 10.1038/s41467-018-05747-8.

Genetic Architecture of the Cardiovascular Risk Proteome.心血管风险蛋白质组的遗传结构。

Circulation. 2018 Mar 13;137(11):1158-1172. doi: 10.1161/CIRCULATIONAHA.117.029536. Epub 2017 Dec 19.

Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology.分析共享，大数据环境下遗传流行病学发现的团队方法。

Nat Genet. 2017 Oct 27;49(11):1560-1563. doi: 10.1038/ng.3968.

FATHMM-XF: accurate prediction of pathogenic point mutations via extended features.FATHMM-XF：通过扩展特征准确预测致病性点突变。

Bioinformatics. 2018 Feb 1;34(3):511-513. doi: 10.1093/bioinformatics/btx536.

Practical Approaches for Whole-Genome Sequence Analysis of Heart- and Blood-Related Traits.心脏和血液相关性状全基因组序列分析的实用方法

Am J Hum Genet. 2017 Feb 2;100(2):205-215. doi: 10.1016/j.ajhg.2016.12.009. Epub 2017 Jan 12.

Empirical Bayes scan statistics for detecting clusters of disease risk variants in genetic studies.用于在基因研究中检测疾病风险变异簇的经验贝叶斯扫描统计量。

Biometrics. 2015 Dec;71(4):1111-20. doi: 10.1111/biom.12331. Epub 2015 Jun 1.

Pharmacogenetic meta-analysis of genome-wide association studies of LDL cholesterol response to statins.低密度脂蛋白胆固醇对他汀类药物反应的全基因组关联研究的药物遗传学荟萃分析。

Nat Commun. 2014 Oct 28;5:5068. doi: 10.1038/ncomms6068.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验