Suppr超能文献

用于高通量靶向重测序的重叠文库。

Overlapping pools for high-throughput targeted resequencing.

作者信息

Prabhu Snehit, Pe'er Itsik

机构信息

Department of Computer Science, Columbia University, New York, New York 10025, USA.

出版信息

Genome Res. 2009 Jul;19(7):1254-61. doi: 10.1101/gr.088559.108. Epub 2009 May 15.

Abstract

Resequencing genomic DNA from pools of individuals is an effective strategy to detect new variants in targeted regions and compare them between cases and controls. There are numerous ways to assign individuals to the pools on which they are to be sequenced. The naïve, disjoint pooling scheme (many individuals to one pool) in predominant use today offers insight into allele frequencies, but does not offer the identity of an allele carrier. We present a framework for overlapping pool design, where each individual sample is resequenced in several pools (many individuals to many pools). Upon discovering a variant, the set of pools where this variant is observed reveals the identity of its carrier. We formalize the mathematical framework for such pool designs and list the requirements from such designs. We specifically address three practical concerns for pooled resequencing designs: (1) false-positives due to errors introduced during amplification and sequencing; (2) false-negatives due to undersampling particular alleles aggravated by nonuniform coverage; and consequently, (3) ambiguous identification of individual carriers in the presence of errors. We build on theory of error-correcting codes to design pools that overcome these pitfalls. We show that in practical parameters of resequencing studies, our designs guarantee high probability of unambiguous singleton carrier identification while maintaining the features of naïve pools in terms of sensitivity, specificity, and the ability to estimate allele frequencies. We demonstrate the ability of our designs in extracting rare variations using short read data from the 1000 Genomes Pilot 3 project.

摘要

对个体样本池中的基因组DNA进行重测序是一种在目标区域检测新变异并在病例组和对照组之间进行比较的有效策略。有多种方法可将个体分配到要进行测序的样本池中。目前主要使用的简单、不相交的混合方案(多个个体放入一个样本池)能提供等位基因频率信息,但无法确定等位基因携带者的身份。我们提出了一种重叠样本池设计框架,即每个个体样本在多个样本池中进行重测序(多个个体放入多个样本池)。发现变异后,观察到该变异的样本池集合就能揭示其携带者的身份。我们将这种样本池设计的数学框架形式化,并列出此类设计的要求。我们特别针对混合重测序设计中的三个实际问题进行了探讨:(1)由于扩增和测序过程中引入的错误导致的假阳性;(2)由于特定等位基因抽样不足且覆盖不均匀而加剧的假阴性;以及因此产生的(3)在存在错误的情况下个体携带者身份的模糊识别。我们基于纠错码理论来设计样本池,以克服这些缺陷。我们表明,在重测序研究的实际参数下,我们的设计保证了明确识别单倍型携带者的高概率,同时在敏感性、特异性以及估计等位基因频率的能力方面保持了简单样本池的特点。我们利用来自千人基因组计划先导3项目的短读长数据,展示了我们的设计提取罕见变异的能力。

相似文献

1
Overlapping pools for high-throughput targeted resequencing.用于高通量靶向重测序的重叠文库。
Genome Res. 2009 Jul;19(7):1254-61. doi: 10.1101/gr.088559.108. Epub 2009 May 15.
7
Rare variant discovery and calling by sequencing pooled samples with overlaps.重叠测序池样本进行罕见变异发现和调用。
Bioinformatics. 2013 Jan 1;29(1):29-38. doi: 10.1093/bioinformatics/bts645. Epub 2012 Oct 27.

引用本文的文献

2
A joint use of pooling and imputation for genotyping SNPs.联合使用池化和插补进行 SNP 基因分型。
BMC Bioinformatics. 2022 Oct 13;23(1):421. doi: 10.1186/s12859-022-04974-7.
6
Assessing risk for Mendelian disorders in a Bronx population.评估布朗克斯区人群孟德尔疾病的风险。
Mol Genet Genomic Med. 2017 Jul 6;5(5):516-523. doi: 10.1002/mgg3.307. eCollection 2017 Sep.

本文引用的文献

5
The impact of next-generation sequencing technology on genetics.下一代测序技术对遗传学的影响。
Trends Genet. 2008 Mar;24(3):133-41. doi: 10.1016/j.tig.2007.12.007. Epub 2008 Feb 11.
6
Common sense for our genomes.我们基因组的常识。
Nature. 2007 Oct 18;449(7164):783-4. doi: 10.1038/449783a.
7
The diploid genome sequence of an individual human.某个人类个体的二倍体基因组序列。
PLoS Biol. 2007 Sep 4;5(10):e254. doi: 10.1371/journal.pbio.0050254.
8
Using DNA pools for genotyping trios.使用DNA池对三联体进行基因分型。
Nucleic Acids Res. 2006;34(19):e129. doi: 10.1093/nar/gkl700. Epub 2006 Oct 4.
9

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验