Suppr超能文献

评估序列数据中罕见变异分析方法。

Evaluating methods for the analysis of rare variants in sequence data.

作者信息

Luedtke Alexander, Powers Scott, Petersen Ashley, Sitarik Alexandra, Bekmetjev Airat, Tintle Nathan L

机构信息

Division of Applied Mathematics, Brown University, 182 George Street, Providence, RI 02912, USA.

Department of Statistics and Operations Research, 318 Hanes Hall, CB 3260, University of North Carolina, Chapel Hill, NC 27599-3260, USA.

出版信息

BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S119. doi: 10.1186/1753-6561-5-S9-S119.

Abstract

A number of rare variant statistical methods have been proposed for analysis of the impending wave of next-generation sequencing data. To date, there are few direct comparisons of these methods on real sequence data. Furthermore, there is a strong need for practical advice on the proper analytic strategies for rare variant analysis. We compare four recently proposed rare variant methods (combined multivariate and collapsing, weighted sum, proportion regression, and cumulative minor allele test) on simulated phenotype and next-generation sequencing data as part of Genetic Analysis Workshop 17. Overall, we find that all analyzed methods have serious practical limitations on identifying causal genes. Specifically, no method has more than a 5% true discovery rate (percentage of truly causal genes among all those identified as significantly associated with the phenotype). Further exploration shows that all methods suffer from inflated false-positive error rates (chance that a noncausal gene will be identified as associated with the phenotype) because of population stratification and gametic phase disequilibrium between noncausal SNPs and causal SNPs. Furthermore, observed true-positive rates (chance that a truly causal gene will be identified as significantly associated with the phenotype) for each of the four methods was very low (<19%). The combination of larger than anticipated false-positive rates, low true-positive rates, and only about 1% of all genes being causal yields poor discriminatory ability for all four methods. Gametic phase disequilibrium and population stratification are important areas for further research in the analysis of rare variant data.

摘要

为了分析即将到来的新一代测序数据浪潮,人们已经提出了一些罕见变异统计方法。到目前为止,在真实序列数据上对这些方法进行的直接比较很少。此外,对于罕见变异分析的适当分析策略,非常需要实用的建议。作为遗传分析研讨会17的一部分,我们在模拟表型和新一代测序数据上比较了四种最近提出的罕见变异方法(联合多变量和压缩法、加权和法、比例回归法和累积次要等位基因检验法)。总体而言,我们发现所有分析方法在识别因果基因方面都存在严重的实际局限性。具体来说,没有一种方法的真发现率超过5%(在所有被确定与表型显著相关的基因中,真正因果基因的百分比)。进一步的探索表明,由于群体分层以及非因果单核苷酸多态性(SNP)与因果SNP之间的配子相位不平衡,所有方法都存在虚高的假阳性错误率(非因果基因被确定与表型相关的概率)。此外,这四种方法各自的观察到的真阳性率(真正因果基因被确定与表型显著相关的概率)非常低(<19%)。高于预期的假阳性率、低真阳性率以及所有基因中只有约1%是因果基因的情况相结合,导致这四种方法的鉴别能力都很差。配子相位不平衡和群体分层是罕见变异数据分析中有待进一步研究的重要领域。

相似文献

1
Evaluating methods for the analysis of rare variants in sequence data.评估序列数据中罕见变异分析方法。
BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S119. doi: 10.1186/1753-6561-5-S9-S119.

引用本文的文献

4
Pathway analysis with next-generation sequencing data.利用下一代测序数据进行通路分析。
Eur J Hum Genet. 2015 Apr;23(4):507-15. doi: 10.1038/ejhg.2014.121. Epub 2014 Jul 2.

本文引用的文献

1
Genetic Analysis Workshop 17 mini-exome simulation.遗传分析研讨会17小型外显子模拟
BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S2. doi: 10.1186/1753-6561-5-S9-S2.
7
On measures of gametic disequilibrium.关于配子不平衡的度量。
Genetics. 1988 Nov;120(3):849-52. doi: 10.1093/genetics/120.3.849.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验