Chumakov S, Belapurkar C, Putonti C, Li T-B, Pettitt B M, Fox G E, Willson R C, Fofanov Yu
Department of Computer Science, University of Houston, Houston, TX 77204, USA.
J Biol Phys Chem. 2005 Dec 1;5(4):121-128. doi: 10.4024/40501.jbpc.05.04.
It is shown that the presence/absence pattern of 1000 random oligomers of length 12-13 in a bacterial genome is sufficiently characteristic to readily and unambiguously distinguish any known bacterial genome from any other. Even genomes of extremely closely-related organisms, such as strains of the same species, can be thus distinguished. One evident way to implement this approach in a practical assay is with hybridization arrays. It is envisioned that a single universal array can be readily designed that would allow identification of any bacterium that appears in a database of known patterns. We performed in silico experiments to test this idea. Calculations utilizing 105 publicly-available completely-sequenced microbial genomes allowed us to determine appropriate values of the test oligonucleotide length, n, and the number of probe sequences. Randomly chosen n-mers with a constant G + C content were used to form an in silico array and verify (a) how many n-mers from each genome would hybridize on this chip, and (b) how different the fingerprints of different genomes would be. With the appropriate choice of random oligomer length, the same approach can also be used to identify viral or eukaryotic genomes.
结果表明,细菌基因组中1000个长度为12 - 13的随机寡聚物的存在/缺失模式具有足够的特征,能够轻松且明确地将任何已知细菌基因组与其他基因组区分开来。即使是亲缘关系极其密切的生物体的基因组,如同一个物种的不同菌株,也能以此方式区分。在实际检测中实施这种方法的一种明显方式是使用杂交阵列。可以设想,能够轻松设计出一个单一的通用阵列,用于识别已知模式数据库中出现的任何细菌。我们进行了计算机模拟实验来验证这一想法。利用105个公开可用的完全测序的微生物基因组进行计算,使我们能够确定测试寡核苷酸长度n和探针序列数量的合适值。使用具有恒定G + C含量的随机选择的n聚体来形成计算机模拟阵列,并验证(a)每个基因组中有多少n聚体将与该芯片杂交,以及(b)不同基因组的指纹图谱会有多大差异。通过适当选择随机寡聚物长度,同样的方法也可用于识别病毒或真核生物基因组。