全基因组序列数据中多个罕见变异的合并方法。

Methods for collapsing multiple rare variants in whole-genome sequence data.

作者信息

Sung Yun Ju, Korthauer Keegan D, Swartz Michael D, Engelman Corinne D

机构信息

Division of Biostatistics, Washington University School of Medicine, St. Louis, Missouri, United States of America.

出版信息

Genet Epidemiol. 2014 Sep;38 Suppl 1(0 1):S13-20. doi: 10.1002/gepi.21820.

DOI:10.1002/gepi.21820

PMID:25112183

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4558905/

Abstract

Genetic Analysis Workshop 18 provided whole-genome sequence data in a pedigree-based sample and longitudinal phenotype data for hypertension and related traits, presenting an excellent opportunity for evaluating analysis choices. We summarize the nine contributions to the working group on collapsing methods, which evaluated various approaches for the analysis of multiple rare variants. One contributor defined a variant prioritization scheme, whereas the remaining eight contributors evaluated statistical methods for association analysis. Six contributors chose the gene as the genomic region for collapsing variants, whereas three contributors chose nonoverlapping sliding windows across the entire genome. Statistical methods spanned most of the published methods, including well-established burden tests, variance-components-type tests, and recently developed hybrid approaches. Lesser known methods, such as functional principal components analysis, higher criticism, and homozygosity association, and some newly introduced methods were also used. We found that performance of these methods depended on the characteristics of the genomic region, such as effect size and direction of variants under consideration. Except for MAP4 and FLT3, the performance of all statistical methods to identify rare casual variants was disappointingly poor, providing overall power almost identical to the type I error. This poor performance may have arisen from a combination of (1) small sample size, (2) small effects of most of the causal variants, explaining a small fraction of variance, (3) use of incomplete annotation information, and (4) linkage disequilibrium between causal variants in a gene and noncausal variants in nearby genes. Our findings demonstrate challenges in analyzing rare variants identified from sequence data.

摘要

遗传分析研讨会18提供了基于家系样本的全基因组序列数据以及高血压和相关性状的纵向表型数据，为评估分析选择提供了绝佳机会。我们总结了对折叠方法工作组的九项贡献，该工作组评估了多种分析多个罕见变异的方法。一位贡献者定义了变异优先级方案，而其余八位贡献者评估了关联分析的统计方法。六位贡献者选择基因作为折叠变异的基因组区域，而三位贡献者选择了覆盖整个基因组的非重叠滑动窗口。统计方法涵盖了大多数已发表的方法，包括成熟的负担检验、方差成分类型检验以及最近开发的混合方法。还使用了鲜为人知的方法，如功能主成分分析、高等批评和纯合性关联，以及一些新引入的方法。我们发现这些方法的性能取决于基因组区域的特征，如所考虑变异的效应大小和方向。除了MAP4和FLT3，所有识别罕见因果变异的统计方法的性能都差得令人失望，提供的总体效能几乎与I型错误相同。这种不佳的性能可能是由以下因素共同导致的：（1）样本量小；（2）大多数因果变异的效应小，解释的方差比例小；（3）使用不完整的注释信息；（4）基因中的因果变异与附近基因中的非因果变异之间的连锁不平衡。我们的研究结果表明了在分析从序列数据中识别出的罕见变异时所面临的挑战。

相似文献

Methods for collapsing multiple rare variants in whole-genome sequence data.全基因组序列数据中多个罕见变异的合并方法。

Genet Epidemiol. 2014 Sep;38 Suppl 1(0 1):S13-20. doi: 10.1002/gepi.21820.

Pathway analysis approaches for rare and common variants: insights from Genetic Analysis Workshop 18.罕见和常见变异的通路分析方法：来自遗传分析研讨会18的见解

Genet Epidemiol. 2014 Sep;38 Suppl 1(0 1):S86-91. doi: 10.1002/gepi.21831.

Testing genetic association with rare and common variants in family data.在家族数据中检测与罕见和常见变异的基因关联性。

Genet Epidemiol. 2014 Sep;38 Suppl 1(0 1):S37-43. doi: 10.1002/gepi.21823.

Rare variant analysis of blood pressure phenotypes in the Genetic Analysis Workshop 18 whole genome sequencing data using sequence kernel association test.使用序列核关联检验对遗传分析研讨会18全基因组测序数据中的血压表型进行罕见变异分析。

BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S10. doi: 10.1186/1753-6561-8-S1-S10. eCollection 2014.

Whole genome sequence analysis of the simulated systolic blood pressure in Genetic Analysis Workshop 18 family data: long-term average and collapsing methods.遗传分析研讨会18家庭数据中模拟收缩压的全基因组序列分析：长期平均法和压缩法

BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S12. doi: 10.1186/1753-6561-8-S1-S12. eCollection 2014.

Challenges of linkage analysis in the era of whole-genome sequencing.全基因组测序时代连锁分析面临的挑战。

Genet Epidemiol. 2014 Sep;38 Suppl 1:S92-6. doi: 10.1002/gepi.21832.

Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data.检测常见疾病与罕见变异关联的方法：在序列数据分析中的应用。

Am J Hum Genet. 2008 Sep;83(3):311-21. doi: 10.1016/j.ajhg.2008.06.024. Epub 2008 Aug 7.

The power comparison of the haplotype-based collapsing tests and the variant-based collapsing tests for detecting rare variants in pedigrees.基于单倍型的合并检验与基于变异的合并检验在系谱中检测罕见变异的效能比较。

BMC Genomics. 2014 Jul 28;15(1):632. doi: 10.1186/1471-2164-15-632.

Identification of genetic association of multiple rare variants using collapsing methods.使用连锁分析方法鉴定多个罕见变异的遗传关联。

Genet Epidemiol. 2011;35 Suppl 1(Suppl 1):S101-6. doi: 10.1002/gepi.20658.

Drinking from the Holy Grail: analysis of whole-genome sequencing from the Genetic Analysis Workshop 18.圣杯之探：遗传分析研讨会18全基因组测序分析

Genet Epidemiol. 2014 Sep;38 Suppl 1:S1-4. doi: 10.1002/gepi.21818.

引用本文的文献

Assessment of the functionality and usability of open-source rare variant analysis pipelines.开源罕见变异分析流程的功能与可用性评估。

Brief Bioinform. 2025 Feb 5;26(1). doi: 10.1093/bib/bbaf044.

Genetic Factors and Long-term Treatment-Related Neurocognitive Deficits, Anxiety, and Depression in Childhood Leukemia Survivors: An Exome-Wide Association Study.儿童白血病幸存者的遗传因素与长期治疗相关的神经认知缺陷、焦虑和抑郁：一项全外显子组关联研究

Cancer Epidemiol Biomarkers Prev. 2024 Feb 6;33(2):234-243. doi: 10.1158/1055-9965.EPI-23-0634.

Genetic factors in treatment-related cardiovascular complications in survivors of childhood acute lymphoblastic leukemia.儿童急性淋巴细胞白血病幸存者治疗相关心血管并发症的遗传因素。

Pharmacogenomics. 2021 Sep;22(14):885-901. doi: 10.2217/pgs-2021-0067. Epub 2021 Sep 10.

Influence of genetic factors on long-term treatment related neurocognitive complications, and on anxiety and depression in survivors of childhood acute lymphoblastic leukemia: The Petale study.遗传因素对儿童急性淋巴细胞白血病幸存者长期治疗相关神经认知并发症、焦虑和抑郁的影响：Petale 研究。

PLoS One. 2019 Jun 10;14(6):e0217314. doi: 10.1371/journal.pone.0217314. eCollection 2019.

Investigation of novel variations of ORAI1 gene and their association with Kawasaki disease.探讨 ORAI1 基因的新型变异及其与川崎病的关系。

J Hum Genet. 2019 Jun;64(6):511-519. doi: 10.1038/s10038-019-0588-2. Epub 2019 Mar 11.

FastSKAT: Sequence kernel association tests for very large sets of markers.FastSKAT：针对大量标记集的序列核关联检验。

Genet Epidemiol. 2018 Sep;42(6):516-527. doi: 10.1002/gepi.22136. Epub 2018 Jun 22.

Family-based tests for associating haplotypes with general phenotype data: Improving the FBAT-haplotype algorithm.用于将单倍型与一般表型数据相关联的基于家系的检验：改进FBAT单倍型算法

Genet Epidemiol. 2018 Feb;42(1):123-126. doi: 10.1002/gepi.22094. Epub 2017 Nov 21.

Longitudinal data analysis for rare variants detection with penalized quadratic inference function.基于惩罚二次推断函数的稀有变异检测的纵向数据分析。

Sci Rep. 2017 Apr 5;7(1):650. doi: 10.1038/s41598-017-00712-9.

The Increasing Importance of Gene-Based Analyses.基于基因的分析的重要性日益增加。

PLoS Genet. 2016 Apr 7;12(4):e1005852. doi: 10.1371/journal.pgen.1005852. eCollection 2016 Apr.

Summary of results and discussions from the gene-based tests group at Genetic Analysis Workshop 18.遗传分析研讨会18基因检测组的结果与讨论总结。

Genet Epidemiol. 2014 Sep;38 Suppl 1(Suppl 1):S44-8. doi: 10.1002/gepi.21824.

本文引用的文献

Evaluation of gene-based association tests for analyzing rare variants using Genetic Analysis Workshop 18 data.使用遗传分析研讨会18的数据评估基于基因的关联测试以分析罕见变异。

BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S9. doi: 10.1186/1753-6561-8-S1-S9. eCollection 2014.

A comparison of two collapsing methods in different approaches.不同方法中两种折叠方法的比较。

BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S8. doi: 10.1186/1753-6561-8-S1-S8. eCollection 2014.

Considering interactive effects in the identification of influential regions with extremely rare variants via fixed bin approach.通过固定区间方法识别具有极其罕见变异的影响区域时考虑交互作用。

BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S7. doi: 10.1186/1753-6561-8-S1-S7. eCollection 2014.

Analysis of homozygosity disequilibrium using whole-genome sequencing data.利用全基因组测序数据进行纯合性不平衡分析。

BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S15. doi: 10.1186/1753-6561-8-S1-S15. eCollection 2014.

Higher criticism approach to detect rare variants using whole genome sequencing data.使用全基因组测序数据检测罕见变异的高级批判方法。

BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S14. doi: 10.1186/1753-6561-8-S1-S14. eCollection 2014.

Small sample properties of rare variant analysis methods.罕见变异分析方法的小样本属性。

BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S13. doi: 10.1186/1753-6561-8-S1-S13. eCollection 2014.

BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S12. doi: 10.1186/1753-6561-8-S1-S12. eCollection 2014.

Genetic Analysis Workshop 18 single-nucleotide variant prioritization based on protein impact, sequence conservation, and gene annotation.基于蛋白质影响、序列保守性和基因注释的遗传分析研讨会18单核苷酸变异优先级排序

BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S11. doi: 10.1186/1753-6561-8-S1-S11. eCollection 2014.

BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S10. doi: 10.1186/1753-6561-8-S1-S10. eCollection 2014.

Data for Genetic Analysis Workshop 18: human whole genome sequence, blood pressure, and simulated phenotypes in extended pedigrees.遗传分析研讨会18的数据：人类全基因组序列、血压以及扩展家系中的模拟表型。

BMC Proc. 2014 Jun 17;8(Suppl 1):S2. doi: 10.1186/1753-6561-8-S1-S2. eCollection 2014.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验