• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用混合测序数据模拟评估等位基因频率估计

Evaluation of allele frequency estimation using pooled sequencing data simulation.

作者信息

Guo Yan, Samuels David C, Li Jiang, Clark Travis, Li Chung-I, Shyr Yu

机构信息

Vanderbilt Ingram Cancer Center, Nashville, TN, USA.

出版信息

ScientificWorldJournal. 2013;2013:895496. doi: 10.1155/2013/895496. Epub 2013 Feb 7.

DOI:10.1155/2013/895496
PMID:23476151
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3582166/
Abstract

Next-generation sequencing (NGS) technology has provided researchers with opportunities to study the genome in unprecedented detail. In particular, NGS is applied to disease association studies. Unlike genotyping chips, NGS is not limited to a fixed set of SNPs. Prices for NGS are now comparable to the SNP chip, although for large studies the cost can be substantial. Pooling techniques are often used to reduce the overall cost of large-scale studies. In this study, we designed a rigorous simulation model to test the practicability of estimating allele frequency from pooled sequencing data. We took crucial factors into consideration, including pool size, overall depth, average depth per sample, pooling variation, and sampling variation. We used real data to demonstrate and measure reference allele preference in DNAseq data and implemented this bias in our simulation model. We found that pooled sequencing data can introduce high levels of relative error rate (defined as error rate divided by targeted allele frequency) and that the error rate is more severe for low minor allele frequency SNPs than for high minor allele frequency SNPs. In order to overcome the error introduced by pooling, we recommend a large pool size and high average depth per sample.

摘要

下一代测序(NGS)技术为研究人员提供了前所未有的详细研究基因组的机会。特别是,NGS被应用于疾病关联研究。与基因分型芯片不同,NGS不限于一组固定的单核苷酸多态性(SNP)。目前NGS的价格与SNP芯片相当,尽管对于大型研究来说成本可能很高。合并技术通常用于降低大规模研究的总体成本。在本研究中,我们设计了一个严格的模拟模型来测试从合并测序数据估计等位基因频率的实用性。我们考虑了关键因素,包括池大小、总体深度、每个样本的平均深度、合并变异和抽样变异。我们使用真实数据来证明和测量DNA测序数据中的参考等位基因偏好,并在我们的模拟模型中实现这种偏差。我们发现,合并测序数据会引入高水平的相对错误率(定义为错误率除以目标等位基因频率),并且对于低次要等位基因频率的SNP,错误率比高次要等位基因频率的SNP更严重。为了克服合并引入的错误,我们建议采用大的池大小和每个样本高的平均深度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c46/3582166/1d5f5f097ffb/TSWJ2013-895496.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c46/3582166/48a37f865c90/TSWJ2013-895496.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c46/3582166/ab0af42d9eeb/TSWJ2013-895496.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c46/3582166/ab14e382c0bf/TSWJ2013-895496.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c46/3582166/1d5f5f097ffb/TSWJ2013-895496.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c46/3582166/48a37f865c90/TSWJ2013-895496.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c46/3582166/ab0af42d9eeb/TSWJ2013-895496.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c46/3582166/ab14e382c0bf/TSWJ2013-895496.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c46/3582166/1d5f5f097ffb/TSWJ2013-895496.004.jpg

相似文献

1
Evaluation of allele frequency estimation using pooled sequencing data simulation.使用混合测序数据模拟评估等位基因频率估计
ScientificWorldJournal. 2013;2013:895496. doi: 10.1155/2013/895496. Epub 2013 Feb 7.
2
A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms.基于 EM 算法的基于测序数据的等位基因频率估计、SNP 检测和关联研究的统一方法。
BMC Genomics. 2013;14 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2164-14-S1-S1. Epub 2013 Jan 21.
3
An evaluation of allele frequency estimation accuracy using pooled sequencing data.使用混合测序数据评估等位基因频率估计准确性。
Int J Comput Biol Drug Des. 2013;6(4):279-93. doi: 10.1504/IJCBDD.2013.056709. Epub 2013 Sep 30.
4
Design of association studies with pooled or un-pooled next-generation sequencing data.基于汇集或未汇集下一代测序数据的关联研究设计。
Genet Epidemiol. 2010 Jul;34(5):479-91. doi: 10.1002/gepi.20501.
5
Estimation of population allele frequencies from next-generation sequencing data: pool-versus individual-based genotyping.基于下一代测序数据的群体等位基因频率估计:基于池与个体的基因分型。
Mol Ecol. 2013 Jul;22(14):3766-79. doi: 10.1111/mec.12360. Epub 2013 Jun 4.
6
Biases and errors on allele frequency estimation and disease association tests of next-generation sequencing of pooled samples.混合样本下一代测序的等位基因频率估计和疾病关联检验中的偏倚和误差。
Genet Epidemiol. 2012 Sep;36(6):549-60. doi: 10.1002/gepi.21648. Epub 2012 Jun 6.
7
SNP calling by sequencing pooled samples.基于测序的混合样本 SNP 检测。
BMC Bioinformatics. 2012 Sep 20;13:239. doi: 10.1186/1471-2105-13-239.
8
Comparison of genotyping using pooled DNA samples (allelotyping) and individual genotyping using the affymetrix genome-wide human SNP array 6.0.使用汇集 DNA 样本(等位基因分型)和使用 Affymetrix 全基因组人类 SNP 阵列 6.0 进行个体基因分型的比较。
BMC Genomics. 2013 Jul 26;14:506. doi: 10.1186/1471-2164-14-506.
9
Cost-effective genome-wide estimation of allele frequencies from pooled DNA in Atlantic salmon (Salmo salar L.).从大西洋鲑鱼(Salmo salar L.)混合 DNA 中进行经济有效的全基因组等位基因频率估计。
BMC Genomics. 2013 Jan 16;14:12. doi: 10.1186/1471-2164-14-12.
10
Analysis and optimal design for association studies using next-generation sequencing with case-control pools.使用病例对照样本池的新一代测序进行关联研究的分析与优化设计
Genet Epidemiol. 2012 Dec;36(8):870-81. doi: 10.1002/gepi.21681. Epub 2012 Sep 12.

引用本文的文献

1
Comparing Pool-seq, Rapture, and GBS genotyping for inferring weak population structure: The American lobster () as a case study.比较Pool-seq、Rapture和GBS基因分型以推断弱种群结构:以美洲龙虾()为例进行研究。 (注:原文括号内“()”处内容缺失)
Ecol Evol. 2019 May 26;9(11):6606-6623. doi: 10.1002/ece3.5240. eCollection 2019 Jun.
2
Allele balance bias identifies systematic genotyping errors and false disease associations.等位基因平衡偏倚可识别系统的基因分型错误和虚假的疾病关联。
Hum Mutat. 2019 Jan;40(1):115-126. doi: 10.1002/humu.23674. Epub 2018 Nov 23.
3
Targeted sequencing of established and candidate colorectal cancer genes in the Colon Cancer Family Registry Cohort.

本文引用的文献

1
Biases and errors on allele frequency estimation and disease association tests of next-generation sequencing of pooled samples.混合样本下一代测序的等位基因频率估计和疾病关联检验中的偏倚和误差。
Genet Epidemiol. 2012 Sep;36(6):549-60. doi: 10.1002/gepi.21648. Epub 2012 Jun 6.
2
Exome sequencing generates high quality data in non-target regions.外显子组测序在非靶区域产生高质量数据。
BMC Genomics. 2012 May 20;13:194. doi: 10.1186/1471-2164-13-194.
3
An evaluation of different target enrichment methods in pooled sequencing designs for complex disease association studies.
对结肠癌家族登记队列中已确定的和候选的结直肠癌基因进行靶向测序。
Oncotarget. 2017 Jun 21;8(55):93450-93463. doi: 10.18632/oncotarget.18596. eCollection 2017 Nov 7.
4
The discrepancy among single nucleotide variants detected by DNA and RNA high throughput sequencing data.DNA 和 RNA 高通量测序数据中单核苷酸变异的差异。
BMC Genomics. 2017 Oct 3;18(Suppl 6):690. doi: 10.1186/s12864-017-4022-x.
5
Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data.高通量基因组重测序数据质量评估方案的评估
Front Genet. 2017 Jul 7;8:94. doi: 10.3389/fgene.2017.00094. eCollection 2017.
6
Power and sample size calculations for high-throughput sequencing-based experiments.基于高通量测序的实验的功效和样本量计算。
Brief Bioinform. 2018 Nov 27;19(6):1247-1255. doi: 10.1093/bib/bbx061.
7
Sequence analysis of pooled bacterial samples enables identification of strain variation in group A streptococcus.对 pooled bacterial samples 进行序列分析可鉴定 A 组链球菌菌株的变异。
Sci Rep. 2017 Mar 31;7:45771. doi: 10.1038/srep45771.
8
Multi-perspective quality control of Illumina RNA sequencing data analysis.Illumina RNA 测序数据分析的多角度质量控制。
Brief Funct Genomics. 2017 Jul 1;16(4):194-204. doi: 10.1093/bfgp/elw035.
9
Mitochondrial DNA heteroplasmy in the emerging field of massively parallel sequencing.大规模平行测序新兴领域中的线粒体DNA异质性
Forensic Sci Int Genet. 2015 Sep;18:131-9. doi: 10.1016/j.fsigen.2015.05.003. Epub 2015 May 6.
10
Statistical strategies for microRNAseq batch effect reduction.用于减少微小RNA测序批次效应的统计策略。
Transl Cancer Res. 2014 Jun 1;3(3):260-265. doi: 10.3978/j.issn.2218-676X.2014.06.05.
在复杂疾病关联研究的合并测序设计中,不同目标富集方法的评估。
PLoS One. 2011;6(11):e26279. doi: 10.1371/journal.pone.0026279. Epub 2011 Nov 1.
4
Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease.全基因组关联研究位点的深度重测序鉴定出与炎症性肠病相关的独立稀有变异。
Nat Genet. 2011 Oct 9;43(11):1066-73. doi: 10.1038/ng.952.
5
Next generation sequencing of pooled samples reveals new SNRNP200 mutations associated with retinitis pigmentosa.高通量测序池样本揭示了与视网膜色素变性相关的新 SNRNP200 突变。
Hum Mutat. 2011 Jun;32(6):E2246-58. doi: 10.1002/humu.21485. Epub 2011 Feb 24.
6
Genome-wide association and linkage identify modifier loci of lung disease severity in cystic fibrosis at 11p13 and 20q13.2.全基因组关联和连锁分析确定了囊性纤维化肺疾病严重程度的修饰基因座,位于 11p13 和 20q13.2。
Nat Genet. 2011 Jun;43(6):539-46. doi: 10.1038/ng.838. Epub 2011 May 22.
7
Genomic DNA pooling strategy for next-generation sequencing-based rare variant discovery in abdominal aortic aneurysm regions of interest-challenges and limitations.基于下一代测序的腹主动脉瘤感兴趣区域稀有变异发现的基因组 DNA 池化策略——挑战和局限性。
J Cardiovasc Transl Res. 2011 Jun;4(3):271-80. doi: 10.1007/s12265-011-9263-5. Epub 2011 Mar 1.
8
Multiple independent variants in 6q21-22 associated with susceptibility to celiac disease in the Dutch, Finnish and Hungarian populations.6q21-22 上的多个独立变异与荷兰、芬兰和匈牙利人群中的乳糜泻易感性相关。
Eur J Hum Genet. 2011 Jun;19(6):682-6. doi: 10.1038/ejhg.2011.2. Epub 2011 Feb 16.
9
A map of human genome variation from population-scale sequencing.人类基因组变异的图谱来自于基于人群的测序。
Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.
10
Genome-wide association study with DNA pooling identifies variants at CNTNAP2 associated with pseudoexfoliation syndrome.全基因组关联研究与 DNA 池化鉴定与假性剥脱综合征相关的 CNTNAP2 变体。
Eur J Hum Genet. 2011 Feb;19(2):186-93. doi: 10.1038/ejhg.2010.144. Epub 2010 Sep 1.