Swaminathan Kankshita, Varala Kranthi, Hudson Matthew E
Department of Crop Sciences, University Of Illinois, Urbana, IL 61801, USA.
BMC Genomics. 2007 May 24;8:132. doi: 10.1186/1471-2164-8-132.
Extensive computational and database tools are available to mine genomic and genetic databases for model organisms, but little genomic data is available for many species of ecological or agricultural significance, especially those with large genomes. Genome surveys using conventional sequencing techniques are powerful, particularly for detecting sequences present in many copies per genome. However these methods are time-consuming and have potential drawbacks. High throughput 454 sequencing provides an alternative method by which much information can be gained quickly and cheaply from high-coverage surveys of genomic DNA.
We sequenced 78 million base-pairs of randomly sheared soybean DNA which passed our quality criteria. Computational analysis of the survey sequences provided global information on the abundant repetitive sequences in soybean. The sequence was used to determine the copy number across regions of large genomic clones or contigs and discover higher-order structures within satellite repeats. We have created an annotated, online database of sequences present in multiple copies in the soybean genome. The low bias of pyrosequencing against repeat sequences is demonstrated by the overall composition of the survey data, which matches well with past estimates of repetitive DNA content obtained by DNA re-association kinetics (Cot analysis).
This approach provides a potential aid to conventional or shotgun genome assembly, by allowing rapid assessment of copy number in any clone or clone-end sequence. In addition, we show that partial sequencing can provide access to partial protein-coding sequences.
目前已有大量的计算和数据库工具用于挖掘模式生物的基因组和遗传数据库,但对于许多具有生态或农业意义的物种,尤其是那些基因组较大的物种,可用的基因组数据却很少。使用传统测序技术进行基因组调查功能强大,特别是对于检测每个基因组中存在多个拷贝的序列。然而,这些方法耗时且存在潜在缺点。高通量454测序提供了一种替代方法,通过这种方法可以从基因组DNA的高覆盖度调查中快速且低成本地获取大量信息。
我们对经过质量标准筛选的随机剪切的大豆DNA的7800万个碱基对进行了测序。对调查序列的计算分析提供了大豆中丰富的重复序列的全局信息。该序列用于确定大型基因组克隆或重叠群区域的拷贝数,并发现卫星重复序列中的高阶结构。我们创建了一个注释的在线数据库,其中包含大豆基因组中多拷贝存在的序列。测序数据的总体组成证明了焦磷酸测序对重复序列的低偏差,这与过去通过DNA重缔合动力学(Cot分析)获得的重复DNA含量估计值非常匹配。
这种方法通过允许快速评估任何克隆或克隆末端序列中的拷贝数,为传统或鸟枪法基因组组装提供了潜在的帮助。此外,我们表明部分测序可以获取部分蛋白质编码序列。