Department of Computer Science and Engineering, University of California, Riverside, California, USA.
PLoS Comput Biol. 2013 Apr;9(4):e1003010. doi: 10.1371/journal.pcbi.1003010. Epub 2013 Apr 4.
For the vast majority of species - including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution) so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding.
对于绝大多数物种——包括许多具有经济或生态重要性的生物,由于缺乏参考基因组序列,生物学研究进展受到阻碍。尽管测序技术最近取得了进展,但仍有几个因素限制了这一关键资源的可用性。与此同时,许多研究小组和国际财团已经制作了 BAC 文库和物理图谱,现在已经能够在以遗传图谱为基础的物理图谱上构建全基因组序列。我们提出了一种 BAC-by-BAC 测序方案,该方案结合了组合池设计和第二代测序技术,可有效地进行从头选择基因组测序。我们表明,在为数百或数千个 DNA 样本(如在此情况下携带基因的最小平铺路径 BAC 克隆)准备测序文库时,组合池设计是一种经济高效且实用的替代方法,而不是全面的 DNA 条形码。该方案的新颖之处在于具有高效比较数亿个短读序列并将它们分配给正确的 BAC 克隆(解卷积)的计算能力,以便可以逐个克隆进行组装。针对水稻基因组模拟数据的实验结果表明,解卷积非常准确,并且得到的 BAC 组装具有高质量。针对富含基因的大麦基因组子集的真实数据的结果证实,解卷积是准确的,并且 BAC 组装具有良好的质量。虽然我们的方法无法提供全面的全基因组测序项目所能达到的完整性水平,但我们表明它在重建 BAC 内的基因序列方面非常成功。对于像大麦这样的植物,这种水平的序列知识足以支持基于图谱的克隆和标记辅助育种等关键终点目标。
PLoS Comput Biol. 2013-4-4
BMC Genomics. 2011-5-19
BMC Plant Biol. 2010-11-12
BMC Genomics. 2013-12-16
BMC Genomics. 2009-7-2
BMC Bioinformatics. 2010-11-30
Genome Res. 2003-9
BMC Bioinformatics. 2025-7-24
BMC Bioinformatics. 2022-10-13
Plants (Basel). 2021-11-20
Genome Res. 2012-1-6
Nat Rev Genet. 2011-11-29
Nucleic Acids Res. 2011-10-21
Genome Res. 2011-9-16
Genome Res. 2011-7-12
BMC Genomics. 2011-4-15
IEEE Trans Inf Theory. 2010-2
Bioinformatics. 2010-8-16
Nucleic Acids Res. 2010-8-10
Genome Res. 2009-12-17