评估序列数据中罕见变异分析方法。

Evaluating methods for the analysis of rare variants in sequence data.

作者信息

Luedtke Alexander, Powers Scott, Petersen Ashley, Sitarik Alexandra, Bekmetjev Airat, Tintle Nathan L

机构信息

Division of Applied Mathematics, Brown University, 182 George Street, Providence, RI 02912, USA.

Department of Statistics and Operations Research, 318 Hanes Hall, CB 3260, University of North Carolina, Chapel Hill, NC 27599-3260, USA.

出版信息

BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S119. doi: 10.1186/1753-6561-5-S9-S119.

DOI:10.1186/1753-6561-5-S9-S119

PMID:22373354

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3287843/

Abstract

A number of rare variant statistical methods have been proposed for analysis of the impending wave of next-generation sequencing data. To date, there are few direct comparisons of these methods on real sequence data. Furthermore, there is a strong need for practical advice on the proper analytic strategies for rare variant analysis. We compare four recently proposed rare variant methods (combined multivariate and collapsing, weighted sum, proportion regression, and cumulative minor allele test) on simulated phenotype and next-generation sequencing data as part of Genetic Analysis Workshop 17. Overall, we find that all analyzed methods have serious practical limitations on identifying causal genes. Specifically, no method has more than a 5% true discovery rate (percentage of truly causal genes among all those identified as significantly associated with the phenotype). Further exploration shows that all methods suffer from inflated false-positive error rates (chance that a noncausal gene will be identified as associated with the phenotype) because of population stratification and gametic phase disequilibrium between noncausal SNPs and causal SNPs. Furthermore, observed true-positive rates (chance that a truly causal gene will be identified as significantly associated with the phenotype) for each of the four methods was very low (<19%). The combination of larger than anticipated false-positive rates, low true-positive rates, and only about 1% of all genes being causal yields poor discriminatory ability for all four methods. Gametic phase disequilibrium and population stratification are important areas for further research in the analysis of rare variant data.

摘要

为了分析即将到来的新一代测序数据浪潮，人们已经提出了一些罕见变异统计方法。到目前为止，在真实序列数据上对这些方法进行的直接比较很少。此外，对于罕见变异分析的适当分析策略，非常需要实用的建议。作为遗传分析研讨会17的一部分，我们在模拟表型和新一代测序数据上比较了四种最近提出的罕见变异方法（联合多变量和压缩法、加权和法、比例回归法和累积次要等位基因检验法）。总体而言，我们发现所有分析方法在识别因果基因方面都存在严重的实际局限性。具体来说，没有一种方法的真发现率超过5%（在所有被确定与表型显著相关的基因中，真正因果基因的百分比）。进一步的探索表明，由于群体分层以及非因果单核苷酸多态性（SNP）与因果SNP之间的配子相位不平衡，所有方法都存在虚高的假阳性错误率（非因果基因被确定与表型相关的概率）。此外，这四种方法各自的观察到的真阳性率（真正因果基因被确定与表型显著相关的概率）非常低（<19%）。高于预期的假阳性率、低真阳性率以及所有基因中只有约1%是因果基因的情况相结合，导致这四种方法的鉴别能力都很差。配子相位不平衡和群体分层是罕见变异数据分析中有待进一步研究的重要领域。

相似文献

Evaluating methods for the analysis of rare variants in sequence data.评估序列数据中罕见变异分析方法。

BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S119. doi: 10.1186/1753-6561-5-S9-S119.

Application of collapsing methods for continuous traits to the Genetic Analysis Workshop 17 exome sequence data.将连续性状的压缩方法应用于遗传分析研讨会17外显子组序列数据。

BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S121. doi: 10.1186/1753-6561-5-S9-S121.

Identification of genetic association of multiple rare variants using collapsing methods.使用连锁分析方法鉴定多个罕见变异的遗传关联。

Genet Epidemiol. 2011;35 Suppl 1(Suppl 1):S101-6. doi: 10.1002/gepi.20658.

Inflated type I error rates when using aggregation methods to analyze rare variants in the 1000 Genomes Project exon sequencing data in unrelated individuals: summary results from Group 7 at Genetic Analysis Workshop 17.在分析无关个体的 1000 基因组项目外显子测序数据中的罕见变异时，使用聚合方法会导致膨胀的Ⅰ型错误率：来自第 17 届遗传分析研讨会第 7 组的总结结果。

Genet Epidemiol. 2011;35 Suppl 1(Suppl 1):S56-60. doi: 10.1002/gepi.20650.

GENOME-WIDE ASSOCIATION MAPPING AND RARE ALLELES: FROM POPULATION GENOMICS TO PERSONALIZED MEDICINE - Session Introduction.全基因组关联图谱与罕见等位基因：从群体基因组学到个性化医学——会议介绍

Pac Symp Biocomput. 2011:74-5. doi: 10.1142/9789814335058_0008.

Evaluating methods for combining rare variant data in pathway-based tests of genetic association.评估在基于通路的基因关联测试中合并稀有变异数据的方法。

BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S48. doi: 10.1186/1753-6561-5-S9-S48.

Evaluation of a LASSO regression approach on the unrelated samples of Genetic Analysis Workshop 17.对遗传分析研讨会17的非相关样本进行套索回归方法评估。

BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S12. doi: 10.1186/1753-6561-5-S9-S12.

Collapsing-based and kernel-based single-gene analyses applied to Genetic Analysis Workshop 17 mini-exome data.应用于遗传分析研讨会17小外显子数据的基于塌缩法和基于核函数法的单基因分析。

BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S117. doi: 10.1186/1753-6561-5-S9-S117. eCollection 2011.

Comparison of scoring methods for the detection of causal genes with or without rare variants.用于检测有无罕见变异的因果基因的评分方法比较。

BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S49. doi: 10.1186/1753-6561-5-S9-S49.

Effect of population stratification analysis on false-positive rates for common and rare variants.群体分层分析对常见和罕见变异假阳性率的影响。

BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S116. doi: 10.1186/1753-6561-5-S9-S116.

引用本文的文献

Hidden secrets of the cancer genome: unlocking the impact of non-coding mutations in gene regulatory elements.癌症基因组的隐藏秘密：揭示基因调控元件中非编码突变的影响。

Cell Mol Life Sci. 2024 Jun 20;81(1):274. doi: 10.1007/s00018-024-05314-z.

Improving the filtering of false positive single nucleotide variations by combining genomic features with quality metrics.通过将基因组特征与质量指标相结合，提高假阳性单核苷酸变异的过滤效果。

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad694.

Quantitative trait locus (xQTL) approaches identify risk genes and drug targets from human non-coding genomes.数量性状基因座 (xQTL) 方法从人类非编码基因组中鉴定风险基因和药物靶点。

Hum Mol Genet. 2022 Oct 20;31(R1):R105-R113. doi: 10.1093/hmg/ddac208.

Pathway analysis with next-generation sequencing data.利用下一代测序数据进行通路分析。

Eur J Hum Genet. 2015 Apr;23(4):507-15. doi: 10.1038/ejhg.2014.121. Epub 2014 Jul 2.

A method to incorporate prior information into score test for genetic association studies.一种将先验信息纳入遗传关联研究评分检验的方法。

BMC Bioinformatics. 2014 Jan 22;15:24. doi: 10.1186/1471-2105-15-24.

VarBin, a novel method for classifying true and false positive variants in NGS data.VarBin，一种用于分类 NGS 数据中真阳性和假阳性变体的新方法。

BMC Bioinformatics. 2013;14 Suppl 13(Suppl 13):S2. doi: 10.1186/1471-2105-14-S13-S2. Epub 2013 Oct 1.

A geometric framework for evaluating rare variant tests of association.用于评估关联罕见变异检验的几何框架。

Genet Epidemiol. 2013 May;37(4):345-57. doi: 10.1002/gepi.21722. Epub 2013 Mar 21.

Assessing the impact of differential genotyping errors on rare variant tests of association.评估差异基因型错误对罕见变异关联检验的影响。

PLoS One. 2013;8(3):e56626. doi: 10.1371/journal.pone.0056626. Epub 2013 Mar 5.

Statistical tests for detecting associations with groups of genetic variants: generalization, evaluation, and implementation.统计检验用于检测与基因变异组的关联：推广、评估和实施。

Eur J Hum Genet. 2013 Jun;21(6):680-6. doi: 10.1038/ejhg.2012.220. Epub 2012 Oct 24.

Digging into the extremes: a useful approach for the analysis of rare variants with continuous traits?深入研究极端情况：一种分析具有连续性状的罕见变异的有用方法？

BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S105. doi: 10.1186/1753-6561-5-S9-S105.

本文引用的文献

Genetic Analysis Workshop 17 mini-exome simulation.遗传分析研讨会17小型外显子模拟

BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S2. doi: 10.1186/1753-6561-5-S9-S2.

Statistical analysis of rare sequence variants: an overview of collapsing methods.稀有序列变异的统计分析：压缩方法概述。

Genet Epidemiol. 2011;35 Suppl 1(Suppl 1):S12-7. doi: 10.1002/gepi.20643.

Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes.扩展罕见变异测试策略：非编码序列和推断基因型分析。

Am J Hum Genet. 2010 Nov 12;87(5):604-17. doi: 10.1016/j.ajhg.2010.10.012.

An evaluation of statistical approaches to rare variant analysis in genetic association studies.遗传关联研究中罕见变异分析的统计方法评估。

Genet Epidemiol. 2010 Feb;34(2):188-93. doi: 10.1002/gepi.20450.

A groupwise association test for rare mutations using a weighted sum statistic.使用加权和统计量对罕见突变进行分组关联测试。

PLoS Genet. 2009 Feb;5(2):e1000384. doi: 10.1371/journal.pgen.1000384. Epub 2009 Feb 13.

Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data.检测常见疾病与罕见变异关联的方法：在序列数据分析中的应用。

Am J Hum Genet. 2008 Sep;83(3):311-21. doi: 10.1016/j.ajhg.2008.06.024. Epub 2008 Aug 7.

On measures of gametic disequilibrium.关于配子不平衡的度量。

Genetics. 1988 Nov;120(3):849-52. doi: 10.1093/genetics/120.3.849.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验