NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.
BMC Genomics. 2010 Jan 11;11:21. doi: 10.1186/1471-2164-11-21.
The approaches for shotgun-based sequencing of vertebrate genomes are now well-established, and have resulted in the generation of numerous draft whole-genome sequence assemblies. In contrast, the process of refining those assemblies to improve contiguity and increase accuracy (known as 'sequence finishing') remains tedious, labor-intensive, and expensive. As a result, the vast majority of vertebrate genome sequences generated to date remain at a draft stage.
To date, our genome sequencing efforts have focused on comparative studies of targeted genomic regions, requiring sequence finishing of large blocks of orthologous sequence (average size 0.5-2 Mb) from various subsets of 75 vertebrates. This experience has provided a unique opportunity to compare the relative effort required to finish shotgun-generated genome sequence assemblies from different species, which we report here. Importantly, we found that the sequence assemblies generated for the same orthologous regions from various vertebrates show substantial variation with respect to misassemblies and, in particular, the frequency and characteristics of sequence gaps. As a consequence, the work required to finish different species' sequences varied greatly. Application of the same standardized methods for finishing provided a novel opportunity to "assay" characteristics of genome sequences among many vertebrate species. It is important to note that many of the problems we have encountered during sequence finishing reflect unique architectural features of a particular vertebrate's genome, which in some cases may have important functional and/or evolutionary implications. Finally, based on our analyses, we have been able to improve our procedures to overcome some of these problems and to increase the overall efficiency of the sequence-finishing process, although significant challenges still remain.
Our findings have important implications for the eventual finishing of the draft whole-genome sequences that have now been generated for a large number of vertebrates.
基于鸟枪法的脊椎动物基因组测序方法现已成熟,并生成了大量的基因组草图序列组装。相比之下,改进这些组装以提高连续性和增加准确性(称为“序列完成”)的过程仍然繁琐、劳动密集且昂贵。因此,迄今为止,绝大多数生成的脊椎动物基因组序列仍处于草图阶段。
迄今为止,我们的基因组测序工作主要集中在靶向基因组区域的比较研究上,需要完成来自 75 种脊椎动物不同子集的大量同源序列(平均大小为 0.5-2 Mb)的大段序列完成。这一经验为比较不同物种的鸟枪法生成基因组序列组装所需的相对工作量提供了独特的机会,我们在此报告。重要的是,我们发现,来自不同脊椎动物的相同同源区域生成的序列组装在组装错误方面存在很大差异,特别是序列缺口的频率和特征。因此,完成不同物种序列所需的工作差异很大。应用相同的标准化方法进行完成提供了一个新颖的机会,可以“检测”许多脊椎动物物种之间的基因组序列特征。需要注意的是,我们在序列完成过程中遇到的许多问题反映了特定脊椎动物基因组的独特结构特征,在某些情况下,这些特征可能具有重要的功能和/或进化意义。最后,基于我们的分析,我们能够改进我们的程序来克服其中的一些问题,并提高序列完成过程的整体效率,尽管仍然存在重大挑战。
我们的发现对最终完成现在已为大量脊椎动物生成的大量基因组草图序列具有重要意义。