Nagarajan Niranjan, Read Timothy D, Pop Mihai
University of Maryland, College Park, MD 20742, USA.
Bioinformatics. 2008 May 15;24(10):1229-35. doi: 10.1093/bioinformatics/btn102. Epub 2008 Mar 20.
New, high-throughput sequencing technologies have made it feasible to cheaply generate vast amounts of sequence information from a genome of interest. The computational reconstruction of the complete sequence of a genome is complicated by specific features of these new sequencing technologies, such as the short length of the sequencing reads and absence of mate-pair information. In this article we propose methods to overcome such limitations by incorporating information from optical restriction maps.
We demonstrate the robustness of our methods to sequencing and assembly errors using extensive experiments on simulated datasets. We then present the results obtained by applying our algorithms to data generated from two bacterial genomes Yersinia aldovae and Yersinia kristensenii. The resulting assemblies contain a single scaffold covering a large fraction of the respective genomes, suggesting that the careful use of optical maps can provide a cost-effective framework for the assembly of genomes.
The tools described here are available as an open-source package at ftp://ftp.cbcb.umd.edu/pub/software/soma
新的高通量测序技术使得从感兴趣的基因组中廉价地生成大量序列信息成为可能。这些新测序技术的特定特征,如测序读段的短长度和缺乏配对末端信息,使得基因组完整序列的计算重建变得复杂。在本文中,我们提出了通过整合光学限制图谱信息来克服此类限制的方法。
我们通过对模拟数据集进行广泛实验,证明了我们方法对测序和组装错误的鲁棒性。然后,我们展示了将算法应用于来自两个细菌基因组——奥尔登耶尔森氏菌和克里斯滕森耶尔森氏菌——生成的数据所获得的结果。所得的组装结果包含一个覆盖相应基因组大部分区域的单一支架,这表明谨慎使用光学图谱可为基因组组装提供一个具有成本效益的框架。
此处描述的工具可作为开源软件包从ftp://ftp.cbcb.umd.edu/pub/software/soma获取