关于基因组测序数据的关联分析：一种将整个基因组划分为非重叠窗口的空间聚类方法。

On the association analysis of genome-sequencing data: A spatial clustering approach for partitioning the entire genome into nonoverlapping windows.

作者信息

Loehlein Fier Heide, Prokopenko Dmitry, Hecker Julian, Cho Michael H, Silverman Edwin K, Weiss Scott T, Tanzi Rudolph E, Lange Christoph

机构信息

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America.

Working Group of Genomic Mathematics, University of Bonn, Bonn, Germany.

出版信息

Genet Epidemiol. 2017 May;41(4):332-340. doi: 10.1002/gepi.22040. Epub 2017 Mar 20.

DOI:10.1002/gepi.22040

PMID:28318110

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5525021/

Abstract

For the association analysis of whole-genome sequencing (WGS) studies, we propose an efficient and fast spatial-clustering algorithm. Compared to existing analysis approaches for WGS data, that define the tested regions either by sliding or consecutive windows of fixed sizes along variants, a meaningful grouping of nearby variants into consecutive regions has the advantage that, compared to sliding window approaches, the number of tested regions is likely to be smaller. In comparison to consecutive, fixed-window approaches, our approach is likely to group nearby variants together. Given existing biological evidence that disease-associated mutations tend to physically cluster in specific regions along the chromosome, the identification of meaningful groups of nearby located variants could thus lead to a potential power gain for association analysis. Our algorithm defines consecutive genomic regions based on the physical positions of the variants, assuming an inhomogeneous Poisson process and groups together nearby variants. As parameters are estimated locally, the algorithm takes the differing variant density along the chromosome into account and provides locally optimal partitioning of variants into consecutive regions. An R-implementation of the algorithm is provided. We discuss the theoretical advances of our algorithm compared to existing, window-based approaches and show the performance and advantage of our introduced algorithm in a simulation study and by an application to Alzheimer's disease WGS data. Our analysis identifies a region in the ITGB3 gene that potentially harbors disease susceptibility loci for Alzheimer's disease. The region-based association signal of ITGB3 replicates in an independent data set and achieves formally genome-wide significance. Software Implementation: An implementation of the algorithm in R is available at: https://github.com/heidefier/cluster_wgs_data.

摘要

对于全基因组测序（WGS）研究的关联分析，我们提出了一种高效快速的空间聚类算法。与现有的WGS数据分析方法相比，现有方法通过沿变异位点滑动或使用固定大小的连续窗口来定义测试区域，将附近的变异位点有意义地分组到连续区域具有这样的优势：与滑动窗口方法相比，测试区域的数量可能更少。与连续的固定窗口方法相比，我们的方法可能会将附近的变异位点聚集在一起。鉴于现有生物学证据表明疾病相关突变倾向于在染色体上的特定区域物理聚集，识别附近定位的变异位点的有意义组可能会为关联分析带来潜在的功效提升。我们的算法基于变异位点的物理位置定义连续的基因组区域，假设为非齐次泊松过程，并将附近的变异位点聚集在一起。由于参数是局部估计的，该算法考虑了沿染色体不同的变异密度，并提供了将变异位点局部最优地划分为连续区域的方法。提供了该算法的R实现。我们讨论了我们的算法与现有基于窗口的方法相比的理论进展，并在模拟研究以及对阿尔茨海默病WGS数据的应用中展示了我们引入算法的性能和优势。我们的分析在整合素β3（ITGB3）基因中识别出一个区域，该区域可能含有阿尔茨海默病的疾病易感位点。ITGB3基于区域的关联信号在一个独立数据集中得到重复，并达到了正式的全基因组显著性水平。软件实现：该算法的R实现可在以下网址获取：https://github.com/heidefier/cluster_wgs_data 。

相似文献

On the association analysis of genome-sequencing data: A spatial clustering approach for partitioning the entire genome into nonoverlapping windows.关于基因组测序数据的关联分析：一种将整个基因组划分为非重叠窗口的空间聚类方法。

Genet Epidemiol. 2017 May;41(4):332-340. doi: 10.1002/gepi.22040. Epub 2017 Mar 20.

Dynamic Scan Procedure for Detecting Rare-Variant Association Regions in Whole-Genome Sequencing Studies.全基因组测序研究中稀有变异关联区域的动态扫描程序。

Am J Hum Genet. 2019 May 2;104(5):802-814. doi: 10.1016/j.ajhg.2019.03.002. Epub 2019 Apr 12.

Knowledge-driven binning approach for rare variant association analysis: application to neuroimaging biomarkers in Alzheimer's disease.用于罕见变异关联分析的知识驱动分箱方法：在阿尔茨海默病神经影像生物标志物中的应用

BMC Med Inform Decis Mak. 2017 May 18;17(Suppl 1):61. doi: 10.1186/s12911-017-0454-0.

Estimating genome-wide significance for whole-genome sequencing studies.估算全基因组测序研究的全基因组显著性。

Genet Epidemiol. 2014 May;38(4):281-90. doi: 10.1002/gepi.21797. Epub 2014 Feb 14.

locStra: Fast analysis of regional/global stratification in whole-genome sequencing studies.快速分析全基因组测序研究中的区域/全局分层。

Genet Epidemiol. 2021 Feb;45(1):82-98. doi: 10.1002/gepi.22356. Epub 2020 Sep 14.

Whole-genome characterization in pedigreed non-human primates using genotyping-by-sequencing (GBS) and imputation.利用简化基因组测序（GBS）和填充技术对圈养非人灵长类动物进行全基因组特征分析。

BMC Genomics. 2016 Aug 24;17(1):676. doi: 10.1186/s12864-016-2966-x.

Whole-genome sequence-based genomic prediction in laying chickens with different genomic relationship matrices to account for genetic architecture.利用不同基因组关系矩阵在蛋鸡中基于全基因组序列进行基因组预测以考虑遗传结构。

Genet Sel Evol. 2017 Jan 16;49(1):8. doi: 10.1186/s12711-016-0277-y.

Alternate-locus aware variant calling in whole genome sequencing.全基因组测序中位点交替感知变异检测

Genome Med. 2016 Dec 13;8(1):130. doi: 10.1186/s13073-016-0383-z.

DoEstRare: A statistical test to identify local enrichments in rare genomic variants associated with disease.DoEstRare：一种用于识别与疾病相关的罕见基因组变异中局部富集情况的统计检验。

PLoS One. 2017 Jul 24;12(7):e0179364. doi: 10.1371/journal.pone.0179364. eCollection 2017.

Region-based analysis of rare genomic variants in whole-genome sequencing datasets reveal two novel Alzheimer's disease-associated genes: DTNB and DLG2.基于区域的全基因组测序数据集稀有基因组变异分析揭示了两个新的阿尔茨海默病相关基因：DTNB 和 DLG2。

Mol Psychiatry. 2022 Apr;27(4):1963-1969. doi: 10.1038/s41380-022-01475-0. Epub 2022 Mar 4.

引用本文的文献

Idiopathic Pulmonary Fibrosis Is Associated with Common Genetic Variants and Limited Rare Variants.特发性肺纤维化与常见遗传变异相关，与罕见变异相关性有限。

Am J Respir Crit Care Med. 2023 May 1;207(9):1194-1202. doi: 10.1164/rccm.202207-1331OC.

Whole-genome sequencing reveals new Alzheimer's disease-associated rare variants in loci related to synaptic function and neuronal development.全基因组测序揭示了与突触功能和神经元发育相关的新的阿尔茨海默病相关罕见变异。

Alzheimers Dement. 2021 Sep;17(9):1509-1527. doi: 10.1002/alz.12319. Epub 2021 Apr 2.

A unifying framework for rare variant association testing in family-based designs, including higher criticism approaches, SKATs, and burden tests.基于家系设计的罕见变异关联测试的统一框架，包括高等批评方法、序列核关联检验（SKATs）和负担检验。

Bioinformatics. 2021 Apr 1;36(22-23):5432-5438. doi: 10.1093/bioinformatics/btaa1055.

Whole Genome Sequencing Identifies CRISPLD2 as a Lung Function Gene in Children With Asthma.全基因组测序鉴定 CRISPLD2 为哮喘患儿的肺功能基因。

Chest. 2019 Dec;156(6):1068-1079. doi: 10.1016/j.chest.2019.08.2202. Epub 2019 Sep 23.

Genetic Advances in Chronic Obstructive Pulmonary Disease. Insights from COPDGene.慢性阻塞性肺疾病的遗传学进展。来自 COPDGene 的见解。

Am J Respir Crit Care Med. 2019 Sep 15;200(6):677-690. doi: 10.1164/rccm.201808-1455SO.

Whole-Genome Sequencing in Severe Chronic Obstructive Pulmonary Disease.全基因组测序在严重慢性阻塞性肺疾病中的应用。

Am J Respir Cell Mol Biol. 2018 Nov;59(5):614-622. doi: 10.1165/rcmb.2018-0088OC.

Family-based tests for associating haplotypes with general phenotype data: Improving the FBAT-haplotype algorithm.用于将单倍型与一般表型数据相关联的基于家系的检验：改进FBAT单倍型算法

Genet Epidemiol. 2018 Feb;42(1):123-126. doi: 10.1002/gepi.22094. Epub 2017 Nov 21.

本文引用的文献

Ckmeans.1d.dp: Optimal -means Clustering in One Dimension by Dynamic Programming.Ckmeans.1d.dp：通过动态规划实现的一维最优均值聚类

R J. 2011 Dec;3(2):29-33.

Rare variants analysis using penalization methods for whole genome sequence data.使用惩罚方法对全基因组序列数据进行罕见变异分析。

BMC Bioinformatics. 2015 Dec 4;16:405. doi: 10.1186/s12859-015-0825-4.

An integrated map of structural variation in 2,504 human genomes.2504个人类基因组结构变异的整合图谱。

Nature. 2015 Oct 1;526(7571):75-81. doi: 10.1038/nature15394.

A global reference for human genetic variation.人类遗传变异的全球参考。

Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.

Recombination affects accumulation of damaging and disease-associated mutations in human populations.重组会影响人类群体中有害和与疾病相关的突变的积累。

Nat Genet. 2015 Apr;47(4):400-4. doi: 10.1038/ng.3216. Epub 2015 Feb 16.

VEGAS2: Software for More Flexible Gene-Based Testing.VEGAS2：用于更灵活的基于基因检测的软件。

Twin Res Hum Genet. 2015 Feb;18(1):86-91. doi: 10.1017/thg.2014.79. Epub 2014 Dec 18.

Rare-variant association analysis: study designs and statistical tests.罕见变异关联分析：研究设计与统计检验。

Am J Hum Genet. 2014 Jul 3;95(1):5-23. doi: 10.1016/j.ajhg.2014.06.009.

Estimating genome-wide significance for whole-genome sequencing studies.估算全基因组测序研究的全基因组显著性。

Genet Epidemiol. 2014 May;38(4):281-90. doi: 10.1002/gepi.21797. Epub 2014 Feb 14.

Whole-exome sequencing identifies rare and low-frequency coding variants associated with LDL cholesterol.全外显子组测序鉴定出与 LDL 胆固醇相关的罕见和低频编码变异。

Am J Hum Genet. 2014 Feb 6;94(2):233-45. doi: 10.1016/j.ajhg.2014.01.010.

Rare-variant extensions of the transmission disequilibrium test: application to autism exome sequence data.罕见变异扩展的传递不平衡检验：在自闭症外显子组序列数据中的应用。

Am J Hum Genet. 2014 Jan 2;94(1):33-46. doi: 10.1016/j.ajhg.2013.11.021. Epub 2013 Dec 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。