• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于检测基因关联的二代测序等位基因计数与分型结果对比

NGS allele counts versus called genotypes for testing genetic association.

作者信息

González Silos Rosa, Fischer Christine, Lorenzo Bermejo Justo

机构信息

Institute of Medical Biometry, University of Heidelberg, 69120, Germany.

Institute of Human Genetics, University of Heidelberg, 69120, Germany.

出版信息

Comput Struct Biotechnol J. 2022 Jul 11;20:3729-3733. doi: 10.1016/j.csbj.2022.07.016. eCollection 2022.

DOI:10.1016/j.csbj.2022.07.016
PMID:35891781
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9294184/
Abstract

UNLABELLED

RNA sequence data are commonly summarized as read counts. By contrast, so far there is no alternative to genotype calling for investigating the relationship between genetic variants determined by next-generation sequencing (NGS) and a phenotype of interest. Here we propose and evaluate the direct analysis of allele counts for genetic association tests. Specifically, we assess the potential advantage of the ratio of alternative allele counts to the total number of reads aligned at a specific position of the genome (coverage) over called genotypes. We simulated association studies based on NGS data from HapMap individuals. Genotype quality scores and allele counts were simulated using NGS data from the Personal Genome Project. Real data from the 1000 Genomes Project was also used to compare the two competing approaches. The average proportions of probability values lower or equal to 0.05 amounted to 0.0496 for called genotypes and 0.0485 for the ratio of alternative allele counts to coverage in the null scenario, and to 0.69 for called genotypes and 0.75 for the ratio of alternative allele counts to coverage in the alternative scenario (9% power increase). The advantage in statistical power of the novel approach increased with decreasing coverage, with decreasing genotype quality and with decreasing allele frequency - 124% power increase for variants with a minor allele frequency lower than 0.05. We provide computer code in R to implement the novel approach, which does not preclude the use of complementary data quality filters before or after identification of the most promising association signals.

AUTHOR SUMMARY

Genetic association tests usually rely on called genotypes. We postulate here that the direct analysis of allele counts from sequence data improves the quality of statistical inference. To evaluate this hypothesis, we investigate simulated and real data using distinct statistical approaches. We demonstrate that association tests based on allele counts rather than called genotypes achieve higher statistical power with controlled type I error rates.

摘要

未标注

RNA序列数据通常总结为读数计数。相比之下,到目前为止,在研究由下一代测序(NGS)确定的遗传变异与感兴趣的表型之间的关系时,除了基因分型外没有其他替代方法。在此,我们提出并评估用于基因关联测试的等位基因计数直接分析方法。具体而言,我们评估了在基因组特定位置比对的替代等位基因计数与总读数数量(覆盖度)的比率相对于已分型基因型的潜在优势。我们基于来自HapMap个体的NGS数据模拟了关联研究。使用来自个人基因组计划的NGS数据模拟了基因型质量分数和等位基因计数。还使用了千人基因组计划的真实数据来比较这两种相互竞争的方法。在无效假设情景下,对于已分型基因型,概率值小于或等于0.05的平均比例为0.0496,对于替代等位基因计数与覆盖度的比率为0.0485;在备择假设情景下,对于已分型基因型为0.69,对于替代等位基因计数与覆盖度的比率为0.75(功效提高9%)。新方法在统计功效上的优势随着覆盖度降低、基因型质量降低和等位基因频率降低而增加——对于次要等位基因频率低于0.05的变异,功效提高124%。我们提供了R语言的计算机代码来实现这种新方法,该方法并不排除在识别最有前景的关联信号之前或之后使用补充数据质量过滤器。

作者总结

基因关联测试通常依赖于已分型的基因型。我们在此假设,对序列数据中的等位基因计数进行直接分析可提高统计推断的质量。为了评估这一假设,我们使用不同的统计方法研究了模拟数据和真实数据。我们证明,基于等位基因计数而非已分型基因型的关联测试在控制I型错误率的情况下具有更高的统计功效。

相似文献

1
NGS allele counts versus called genotypes for testing genetic association.用于检测基因关联的二代测序等位基因计数与分型结果对比
Comput Struct Biotechnol J. 2022 Jul 11;20:3729-3733. doi: 10.1016/j.csbj.2022.07.016. eCollection 2022.
2
Transmission Disequilibrium Tests Based on Read Counts for Low-Coverage Next-Generation Sequence Data.基于低覆盖度下一代测序数据读取计数的传递不平衡检验
Hum Hered. 2015;80(1):36-49. doi: 10.1159/000434645. Epub 2015 Aug 12.
3
Huvariome: a web server resource of whole genome next-generation sequencing allelic frequencies to aid in pathological candidate gene selection.Huvariome:一个用于辅助病理候选基因选择的全基因组下一代测序等位基因频率的网络服务器资源。
J Clin Bioinforma. 2012 Nov 19;2(1):19. doi: 10.1186/2043-9113-2-19.
4
A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms.基于 EM 算法的基于测序数据的等位基因频率估计、SNP 检测和关联研究的统一方法。
BMC Genomics. 2013;14 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2164-14-S1-S1. Epub 2013 Jan 21.
5
Single-variant and multi-variant trend tests for genetic association with next-generation sequencing that are robust to sequencing error.对下一代测序基因关联进行单变量和多变量趋势检验,对测序错误具有稳健性。
Hum Hered. 2012;74(3-4):172-83. doi: 10.1159/000346824. Epub 2013 Apr 11.
6
Likelihood-based complex trait association testing for arbitrary depth sequencing data.针对任意深度测序数据的基于似然性的复杂性状关联测试。
Bioinformatics. 2015 Sep 15;31(18):2955-62. doi: 10.1093/bioinformatics/btv307. Epub 2015 May 14.
7
PhredEM: a phred-score-informed genotype-calling approach for next-generation sequencing studies.PhredEM:一种用于下一代测序研究的基于Phred分数的基因型分型方法。
Genet Epidemiol. 2017 Jul;41(5):375-387. doi: 10.1002/gepi.22048. Epub 2017 May 31.
8
Variant callers for next-generation sequencing data: a comparison study.下一代测序数据的变异调用者:一项比较研究。
PLoS One. 2013 Sep 27;8(9):e75619. doi: 10.1371/journal.pone.0075619. eCollection 2013.
9
Estimation of allele frequency and association mapping using next-generation sequencing data.利用下一代测序数据进行等位基因频率估计和关联作图。
BMC Bioinformatics. 2011 Jun 11;12:231. doi: 10.1186/1471-2105-12-231.
10
Association analysis using next-generation sequence data from publicly available control groups: the robust variance score statistic.利用公共可用对照组的下一代测序数据进行关联分析:稳健方差得分统计。
Bioinformatics. 2014 Aug 1;30(15):2179-88. doi: 10.1093/bioinformatics/btu196. Epub 2014 Apr 14.

引用本文的文献

1
Genetic diversity and relationship of Bugesera and Rwamagana indigenous chicken populations with SASSO chickens using DArTseq SNPs.利用DArTseq SNP技术分析布热塞拉和鲁瓦马加纳本地鸡种群与萨索鸡的遗传多样性及亲缘关系。
PLoS One. 2025 Sep 12;20(9):e0331316. doi: 10.1371/journal.pone.0331316. eCollection 2025.
2
Effectiveness of DArTseq markers application in genetic diversity and population structure of indigenous chickens in Eastern Province of Rwanda.DArTseq标记在卢旺达东部省份本地鸡遗传多样性和群体结构中的应用效果
BMC Genomics. 2024 Feb 19;25(1):193. doi: 10.1186/s12864-024-10089-5.

本文引用的文献

1
Genomic Selection in an Outcrossing Autotetraploid Fruit Crop: Lessons From Blueberry Breeding.异交四倍体果树中的基因组选择:来自蓝莓育种的经验教训。
Front Plant Sci. 2021 Jun 14;12:676326. doi: 10.3389/fpls.2021.676326. eCollection 2021.
2
Genomic Prediction of Autotetraploids; Influence of Relationship Matrices, Allele Dosage, and Continuous Genotyping Calls in Phenotype Prediction.同源四倍体的基因组预测;亲缘关系矩阵、等位基因剂量及连续基因分型调用在表型预测中的影响
G3 (Bethesda). 2019 Apr 9;9(4):1189-1198. doi: 10.1534/g3.119.400059.
3
polyRAD: Genotype Calling with Uncertainty from Sequencing Data in Polyploids and Diploids.
polyRAD:多倍体和二倍体测序数据不确定性下的基因型分型
G3 (Bethesda). 2019 Mar 7;9(3):663-673. doi: 10.1534/g3.118.200913.
4
Genotyping Polyploids from Messy Sequencing Data.从混杂测序数据中进行多倍体基因型分析。
Genetics. 2018 Nov;210(3):789-807. doi: 10.1534/genetics.118.301468. Epub 2018 Sep 5.
5
Optimized Use of Low-Depth Genotyping-by-Sequencing for Genomic Prediction Among Multi-Parental Family Pools and Single Plants in Perennial Ryegrass ( L.).多年生黑麦草(Lolium perenne L.)多亲本家系池和单株基因组预测中低深度测序基因分型的优化应用
Front Plant Sci. 2018 Mar 21;9:369. doi: 10.3389/fpls.2018.00369. eCollection 2018.
6
BCFtools/csq: haplotype-aware variant consequences.BCFtools/csq:单倍型感知变异后果。
Bioinformatics. 2017 Jul 1;33(13):2037-2039. doi: 10.1093/bioinformatics/btx100.
7
Using next-generation DNA sequence data for genetic association tests based on allele counts with and without consideration of zero inflation.使用下一代DNA序列数据进行基于等位基因计数的遗传关联测试,同时考虑和不考虑零膨胀情况。
BMC Proc. 2016 Oct 18;10(Suppl 7):397-404. doi: 10.1186/s12919-016-0062-5. eCollection 2016.
8
A reference panel of 64,976 haplotypes for genotype imputation.用于基因型插补的64976个单倍型参考面板。
Nat Genet. 2016 Oct;48(10):1279-83. doi: 10.1038/ng.3643. Epub 2016 Aug 22.
9
Testing Rare-Variant Association without Calling Genotypes Allows for Systematic Differences in Sequencing between Cases and Controls.在不进行基因型分型的情况下测试罕见变异关联会导致病例组和对照组在测序方面存在系统性差异。
PLoS Genet. 2016 May 6;12(5):e1006040. doi: 10.1371/journal.pgen.1006040. eCollection 2016 May.
10
A global reference for human genetic variation.人类遗传变异的全球参考。
Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.