Department of Plant and Microbial Biology, University of California, Berkeley, California, United States of America.
PLoS One. 2011 Feb 25;6(2):e16717. doi: 10.1371/journal.pone.0016717.
We here develop computational methods to facilitate use of 454 whole genome shotgun sequencing to identify mutations in Escherichia coli K12. We had Roche sequence eight related strains derived as spontaneous mutants in a background without a whole genome sequence. They provided difference tables based on assembling each genome to reference strain E. coli MG1655 (NC_000913). Due to the evolutionary distance to MG1655, these contained a large number of both false negatives and positives. By manual analysis of the dataset, we detected all the known mutations (24 at nine locations) and identified and genetically confirmed new mutations necessary and sufficient for the phenotypes we had selected in four strains. We then had Roche assemble contigs de novo, which we further assembled to full-length pseudomolecules based on synteny with MG1655. This hybrid method facilitated detection of insertion mutations and allowed annotation from MG1655. After removing one genome with less than the optimal 20- to 30-fold sequence coverage, we identified 544 putative polymorphisms that included all of the known and selected mutations apart from insertions. Finally, we detected seven new mutations in a total of only 41 candidates by comparing single genomes to composite data for the remaining six and using a ranking system to penalize homopolymer sequencing and misassembly errors. An additional benefit of the analysis is a table of differences between MG1655 and a physiologically robust E. coli wild-type strain NCM3722. Both projects were greatly facilitated by use of comparative genomics tools in the CoGe software package (http://genomevolution.org/).
我们在这里开发了计算方法,以方便使用 454 全基因组鸟枪法测序来鉴定大肠杆菌 K12 的突变。我们已经让罗氏公司对 8 株相关的自发突变株进行测序,这些突变株的背景没有全基因组序列。他们提供了基于将每个基因组组装到参考菌株大肠杆菌 MG1655(NC_000913)的差异表。由于与 MG1655 的进化距离较远,这些差异表中包含了大量的假阳性和假阴性。通过对数据集的手动分析,我们检测到了所有已知的突变(9 个位置 24 个),并鉴定和遗传上证实了在 4 株菌中选择的表型所必需和充分的新突变。然后,我们让罗氏公司从头组装重叠群,我们根据与 MG1655 的共线性进一步将其组装成全长假分子。这种混合方法有助于检测插入突变,并允许从 MG1655 进行注释。在去除一个基因组的序列覆盖率低于最佳的 20-30 倍之后,我们鉴定了 544 个假定的多态性,其中包括除插入突变之外的所有已知和选定的突变。最后,我们通过将单个基因组与其余 6 个基因组的组合数据进行比较,并使用排名系统来惩罚同源聚合测序和错误组装,在总共只有 41 个候选基因中检测到了 7 个新的突变。该分析的另一个好处是一个比较 MG1655 和生理上健壮的大肠杆菌野生型菌株 NCM3722 之间差异的表格。这两个项目都大大得益于 CoGe 软件包(http://genomevolution.org/)中的比较基因组学工具的使用。