Department of Plant Pathology, University of Kentucky, Lexington, KY 40546, USA.
BMC Genomics. 2010 Feb 2;11:87. doi: 10.1186/1471-2164-11-87.
Analysis of fungal genome sequence assemblies reveals that telomeres are poorly represented even though telomeric reads tend to be superabundant. We surmised that the problem might lie in the DNA shearing conditions used to create clone libraries for genome sequencing.
A shotgun strategy was used to sequence and assemble circular and linear cosmid DNAs sheared using conditions typical for a genome project. The DNA sheared in circular form assembled into a single sequence contig. However, the linearized cosmid produced an incomplete assembly because the two DNA termini, though greatly overrepresented in the clone library used for sequencing, were separated from neighboring sequences by gaps of approximately 1.4 and 1.8 kb. These gap sizes were reduced, but not eliminated, by shearing the linear cosmid into smaller fragments. Mapping of shearing breakpoints revealed a paucity of breaks in the subterminal regions of the linearized cosmid and also near chromosome ends of the fungus Neurospora crassa.
Together, our data indicate that the ends of linear DNA molecules are recalcitrant to hydrodynamic shearing. We propose that this causes DNA termini to be overrepresented in the resulting fragment population but ultimately prevents their incorporation into sequence assemblies.
分析真菌基因组序列组装体表明,端粒的代表性很差,尽管端粒的读取量往往非常丰富。我们推测,问题可能在于用于为基因组测序创建克隆文库的 DNA 剪切条件。
使用一种鸟枪法策略来测序和组装使用典型基因组项目条件剪切的环形和线性 cosmid DNA。以环形形式剪切的 DNA 组装成一个单一的序列连续体。然而,线性化的 cosmid 产生了一个不完整的组装,因为尽管两个 DNA 末端在用于测序的克隆文库中被高度重复,但它们被大约 1.4 和 1.8 kb 的间隙与相邻序列隔开。通过将线性 cosmid 切成更小的片段,可以减少这些间隙的大小,但不能消除它们。剪切断点的映射显示,线性化 cosmid 的亚末端区域以及真菌粗糙脉孢菌的染色体末端附近的断裂点很少。
总之,我们的数据表明线性 DNA 分子的末端对流体动力剪切具有抗性。我们提出,这导致 DNA 末端在产生的片段群体中被过度代表,但最终阻止它们被纳入序列组装。