Suppr超能文献

用于检测基因关联的二代测序等位基因计数与分型结果对比

NGS allele counts versus called genotypes for testing genetic association.

作者信息

González Silos Rosa, Fischer Christine, Lorenzo Bermejo Justo

机构信息

Institute of Medical Biometry, University of Heidelberg, 69120, Germany.

Institute of Human Genetics, University of Heidelberg, 69120, Germany.

出版信息

Comput Struct Biotechnol J. 2022 Jul 11;20:3729-3733. doi: 10.1016/j.csbj.2022.07.016. eCollection 2022.

Abstract

UNLABELLED

RNA sequence data are commonly summarized as read counts. By contrast, so far there is no alternative to genotype calling for investigating the relationship between genetic variants determined by next-generation sequencing (NGS) and a phenotype of interest. Here we propose and evaluate the direct analysis of allele counts for genetic association tests. Specifically, we assess the potential advantage of the ratio of alternative allele counts to the total number of reads aligned at a specific position of the genome (coverage) over called genotypes. We simulated association studies based on NGS data from HapMap individuals. Genotype quality scores and allele counts were simulated using NGS data from the Personal Genome Project. Real data from the 1000 Genomes Project was also used to compare the two competing approaches. The average proportions of probability values lower or equal to 0.05 amounted to 0.0496 for called genotypes and 0.0485 for the ratio of alternative allele counts to coverage in the null scenario, and to 0.69 for called genotypes and 0.75 for the ratio of alternative allele counts to coverage in the alternative scenario (9% power increase). The advantage in statistical power of the novel approach increased with decreasing coverage, with decreasing genotype quality and with decreasing allele frequency - 124% power increase for variants with a minor allele frequency lower than 0.05. We provide computer code in R to implement the novel approach, which does not preclude the use of complementary data quality filters before or after identification of the most promising association signals.

AUTHOR SUMMARY

Genetic association tests usually rely on called genotypes. We postulate here that the direct analysis of allele counts from sequence data improves the quality of statistical inference. To evaluate this hypothesis, we investigate simulated and real data using distinct statistical approaches. We demonstrate that association tests based on allele counts rather than called genotypes achieve higher statistical power with controlled type I error rates.

摘要

未标注

RNA序列数据通常总结为读数计数。相比之下,到目前为止,在研究由下一代测序(NGS)确定的遗传变异与感兴趣的表型之间的关系时,除了基因分型外没有其他替代方法。在此,我们提出并评估用于基因关联测试的等位基因计数直接分析方法。具体而言,我们评估了在基因组特定位置比对的替代等位基因计数与总读数数量(覆盖度)的比率相对于已分型基因型的潜在优势。我们基于来自HapMap个体的NGS数据模拟了关联研究。使用来自个人基因组计划的NGS数据模拟了基因型质量分数和等位基因计数。还使用了千人基因组计划的真实数据来比较这两种相互竞争的方法。在无效假设情景下,对于已分型基因型,概率值小于或等于0.05的平均比例为0.0496,对于替代等位基因计数与覆盖度的比率为0.0485;在备择假设情景下,对于已分型基因型为0.69,对于替代等位基因计数与覆盖度的比率为0.75(功效提高9%)。新方法在统计功效上的优势随着覆盖度降低、基因型质量降低和等位基因频率降低而增加——对于次要等位基因频率低于0.05的变异,功效提高124%。我们提供了R语言的计算机代码来实现这种新方法,该方法并不排除在识别最有前景的关联信号之前或之后使用补充数据质量过滤器。

作者总结

基因关联测试通常依赖于已分型的基因型。我们在此假设,对序列数据中的等位基因计数进行直接分析可提高统计推断的质量。为了评估这一假设,我们使用不同的统计方法研究了模拟数据和真实数据。我们证明,基于等位基因计数而非已分型基因型的关联测试在控制I型错误率的情况下具有更高的统计功效。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验