下一代测序的样本选择策略。

A sample selection strategy for next-generation sequencing.

机构信息

Department of Preventive Medicine, Keck School of Medicine, USC, Los Angeles, California, USA.

出版信息

Genet Epidemiol. 2012 Nov;36(7):696-709. doi: 10.1002/gepi.21664. Epub 2012 Aug 3.

DOI:10.1002/gepi.21664

PMID:22865643

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4272568/

Abstract

Next-generation sequencing technology provides us with vast amounts of sequence data. It is efficient and cheaper than previous sequencing technologies, but deep resequencing of entire samples is still expensive. Therefore, sensible strategies for choosing subsets of samples to sequence are required. Here we describe an algorithm for selection of a sub-sample of an existing sample if one has either of two possible goals in mind: maximizing the number of new polymorphic sites that are detected, or improving the efficiency with which the remaining unsequenced individuals can have their types imputed at newly discovered polymorphisms. We then describe a variation on our algorithm that is more focused on detecting rarer variants. We demonstrate the performance of our algorithm using simulated data and data from the 1000 Genomes Project.

摘要

下一代测序技术为我们提供了大量的序列数据。与以前的测序技术相比，它效率更高，成本更低，但对整个样本进行深度重测序仍然很昂贵。因此，需要明智的策略来选择要测序的样本子集。在这里，我们描述了一种算法，如果您有两个可能的目标之一，那么可以从现有样本中选择一个子样本：最大化检测到的新多态性位点的数量，或者提高以新发现的多态性对剩余未测序个体进行类型推断的效率。然后，我们描述了我们的算法的一个变体，该变体更侧重于检测罕见的变体。我们使用模拟数据和 1000 基因组计划的数据来演示我们算法的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a22/4272568/aad6e2619e12/nihms636767f1.jpg

相似文献

A sample selection strategy for next-generation sequencing.下一代测序的样本选择策略。

Genet Epidemiol. 2012 Nov;36(7):696-709. doi: 10.1002/gepi.21664. Epub 2012 Aug 3.

A map of human genome variation from population-scale sequencing.人类基因组变异的图谱来自于基于人群的测序。

Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.

HapCUT2: A Method for Phasing Genomes Using Experimental Sequence Data.HapCUT2：一种使用实验序列数据进行基因组相位分析的方法。

Methods Mol Biol. 2023;2590:139-147. doi: 10.1007/978-1-0716-2819-5_9.

The linkage method: a novel approach for SNP detection and haplotype reconstruction from a single diploid individual using next-generation sequence data.连锁分析法：一种利用新一代测序数据从单个二倍体个体中检测 SNP 和重建单体型的新方法。

Mol Biol Evol. 2013 Sep;30(9):2187-96. doi: 10.1093/molbev/mst103. Epub 2013 May 31.

Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm.通过序贯蒙特卡罗算法进行联合单倍型组装和基因型分型

BMC Bioinformatics. 2015 Jul 16;16:223. doi: 10.1186/s12859-015-0651-8.

Inference of population mutation rate and detection of segregating sites from next-generation sequence data.从下一代测序数据中推断群体突变率和检测分离位点。

Genetics. 2011 Oct;189(2):595-605. doi: 10.1534/genetics.111.130898. Epub 2011 Aug 11.

A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing.一种使用双末端测序进行二倍体基因组重建的遗传算法。

PLoS One. 2016 Nov 18;11(11):e0166721. doi: 10.1371/journal.pone.0166721. eCollection 2016.

Characterizing and interpreting genetic variation from personal genome sequencing.对个人基因组测序中的基因变异进行表征和解读。

Methods Mol Biol. 2012;838:343-67. doi: 10.1007/978-1-61779-507-7_17.

Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing.Longshot 可通过单分子长读测序对二倍体基因组进行准确的变异调用。

Nat Commun. 2019 Oct 11;10(1):4660. doi: 10.1038/s41467-019-12493-y.

A novel genome-information content-based statistic for genome-wide association analysis designed for next-generation sequencing data.一种基于基因组信息含量的新型统计方法，用于针对下一代测序数据的全基因组关联分析。

J Comput Biol. 2012 Jun;19(6):731-44. doi: 10.1089/cmb.2012.0035. Epub 2012 May 31.

引用本文的文献

Optimizing strain selection for association studies under hard cost constraints.在严格成本限制下优化关联研究的菌株选择

bioRxiv. 2025 Jun 3:2025.05.31.657208. doi: 10.1101/2025.05.31.657208.

Sequencing and imputation in GWAS: Cost-effective strategies to increase power and genomic coverage across diverse populations.GWAS 中的测序和插补：在不同人群中提高效能和基因组覆盖范围的经济有效的策略。

Genet Epidemiol. 2020 Sep;44(6):537-549. doi: 10.1002/gepi.22326. Epub 2020 Jun 9.

Imputation of missing genotypes within LD-blocks relying on the basic coalescent and beyond: consideration of population growth and structure.基于基本合并模型的缺失基因型推断及其扩展：考虑群体增长和结构。

BMC Genomics. 2017 Oct 17;18(1):798. doi: 10.1186/s12864-017-4208-2.

Choosing Subsamples for Sequencing Studies by Minimizing the Average Distance to the Closest Leaf.通过最小化到最近叶子的平均距离来选择测序研究的子样本。

Genetics. 2015 Oct;201(2):499-511. doi: 10.1534/genetics.115.176909. Epub 2015 Aug 24.

Genotype imputation reference panel selection using maximal phylogenetic diversity.基于最大系统发育多样性的基因型推断参考面板选择。

Genetics. 2013 Oct;195(2):319-30. doi: 10.1534/genetics.113.154591. Epub 2013 Aug 9.

本文引用的文献

Testing for an unusual distribution of rare variants.检测罕见变异的异常分布。

PLoS Genet. 2011 Mar;7(3):e1001322. doi: 10.1371/journal.pgen.1001322. Epub 2011 Mar 3.

MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes.MaCH：利用序列和基因型数据来估计单倍型和未观测基因型。

Genet Epidemiol. 2010 Dec;34(8):816-34. doi: 10.1002/gepi.20533.

A map of human genome variation from population-scale sequencing.人类基因组变异的图谱来自于基于人群的测序。

Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.

A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions.一种用于分析下一代测序数据的新自适应方法，用于检测由于基因主效应和相互作用而导致的复杂性状关联的罕见变异体。

PLoS Genet. 2010 Oct 14;6(10):e1001156. doi: 10.1371/journal.pgen.1001156.

Common SNPs explain a large proportion of the heritability for human height.常见的单核苷酸多态性解释了人类身高遗传的很大一部分。

Nat Genet. 2010 Jul;42(7):565-9. doi: 10.1038/ng.608. Epub 2010 Jun 20.

A flexible and accurate genotype imputation method for the next generation of genome-wide association studies.一种用于下一代全基因组关联研究的灵活且准确的基因型填充方法。

PLoS Genet. 2009 Jun;5(6):e1000529. doi: 10.1371/journal.pgen.1000529. Epub 2009 Jun 19.

A groupwise association test for rare mutations using a weighted sum statistic.使用加权和统计量对罕见突变进行分组关联测试。

PLoS Genet. 2009 Feb;5(2):e1000384. doi: 10.1371/journal.pgen.1000384. Epub 2009 Feb 13.

Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data.检测常见疾病与罕见变异关联的方法：在序列数据分析中的应用。

Am J Hum Genet. 2008 Sep;83(3):311-21. doi: 10.1016/j.ajhg.2008.06.024. Epub 2008 Aug 7.

How old is the most recent ancestor of two copies of an allele?一个等位基因的两份拷贝的最近共同祖先距今多久了？

Genetics. 2005 Feb;169(2):1093-104. doi: 10.1534/genetics.103.015768. Epub 2004 Nov 1.

The frequency spectrum of a mutation, and its age, in a general diffusion model.一般扩散模型中突变的频谱及其发生时间

Theor Popul Biol. 2003 Sep;64(2):241-51. doi: 10.1016/s0040-5809(03)00075-3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验