Roach J C, Boysen C, Wang K, Hood L
Department of Molecular Biotechnology, University of Washington, Seattle 98195, USA.
Genomics. 1995 Mar 20;26(2):345-53. doi: 10.1016/0888-7543(95)80219-c.
Strategies for large-scale genomic DNA sequencing currently require physical mapping, followed by detailed mapping, and finally sequencing. The level of mapping detail determines the amount of effort, or sequence redundancy, required to finish a project. Current strategies attempt to find a balance between mapping and sequencing efforts. One such approach is to employ strategies that use sequence data to build physical maps. Such maps alleviate the need for prior mapping and reduce the final required sequence redundancy. To this end, the utility of correlating pairs of sequence data derived from both ends of subcloned templates is well recognized. However, optimal strategies employing such pairwise data have not been established. In the present work, we simulate and analyze the parameters of pairwise sequencing projects including template length, sequence read length, and total sequence redundancy. One pairwise strategy based on sequencing both ends of plasmid subclones is recommended and illustrated with raw data simulations. We find that pairwise strategies are effective with both small (cosmid) and large (megaYAC) targets and produce ordered sequence data with a high level of mapping completeness. They are ideal for finescale mapping and gene finding and as initial steps for either a high- or a low-redundancy sequencing effort. Such strategies are highly automatable.
大规模基因组DNA测序策略目前需要先进行物理图谱构建,接着进行精细图谱绘制,最后进行测序。图谱绘制的详细程度决定了完成一个项目所需的工作量或序列冗余度。当前的策略试图在图谱构建和测序工作之间找到平衡。一种这样的方法是采用利用序列数据构建物理图谱的策略。这样的图谱减少了对预先图谱构建的需求,并降低了最终所需的序列冗余度。为此,源自亚克隆模板两端的成对序列数据的相关性效用已得到充分认可。然而,采用此类成对数据的最佳策略尚未确立。在本研究中,我们模拟并分析了成对测序项目的参数,包括模板长度、序列读取长度和总序列冗余度。推荐了一种基于对质粒亚克隆两端进行测序的成对策略,并通过原始数据模拟进行了说明。我们发现成对策略对小(粘粒)和大(百万碱基对酵母人工染色体)目标均有效,并能产生具有高度图谱完整性的有序序列数据。它们非常适合精细图谱绘制和基因查找,并且可作为高冗余度或低冗余度测序工作的初始步骤。此类策略具有高度的自动化能力。