Department of Plant Pathology and Microbiology, University of California, Riverside, CA 92521, USA.
BMC Bioinformatics. 2011 Oct 10;12:394. doi: 10.1186/1471-2105-12-394.
Population levels of microbial phylotypes can be examined using a hybridization-based method that utilizes a small set of computationally-designed DNA probes targeted to a gene common to all. Our previous algorithm attempts to select a set of probes such that each training sequence manifests a unique theoretical hybridization pattern (a binary fingerprint) to a probe set. It does so without taking into account similarity between training gene sequences or their putative taxonomic classifications, however. We present an improved algorithm for probe set selection that utilizes the available taxonomic information of training gene sequences and attempts to choose probes such that the resultant binary fingerprints cluster into real taxonomic groups.
Gene sequences manifesting identical fingerprints with probes chosen by the new algorithm are more likely to be from the same taxonomic group than probes chosen by the previous algorithm. In cases where they are from different taxonomic groups, underlying DNA sequences of identical fingerprints are more similar to each other in probe sets made with the new versus the previous algorithm. Complete removal of large taxonomic groups from training data does not greatly decrease the ability of probe sets to distinguish those groups.
Probe sets made from the new algorithm create fingerprints that more reliably cluster into biologically meaningful groups. The method can readily distinguish microbial phylotypes that were excluded from the training sequences, suggesting novel microbes can also be detected.
可以使用基于杂交的方法来检查微生物类群的种群水平,该方法利用一组针对所有微生物共有的基因设计的计算探针。我们之前的算法试图选择一组探针,使得每个训练序列都表现出与探针集独特的理论杂交模式(二进制指纹)。然而,它没有考虑到训练基因序列之间的相似性或它们的假定分类学分类。我们提出了一种改进的探针集选择算法,该算法利用了训练基因序列的可用分类学信息,并试图选择探针,使得生成的二进制指纹聚类为真实的分类群。
由新算法选择的探针与表现出相同指纹的基因序列更有可能来自同一分类群,而不是来自之前算法选择的探针。在它们来自不同分类群的情况下,具有相同指纹的潜在 DNA 序列在使用新算法与旧算法生成的探针集中彼此更为相似。从训练数据中完全删除大的分类群不会大大降低探针集区分这些分类群的能力。
新算法生成的探针集创建的指纹更可靠地聚类为具有生物学意义的组。该方法可以很容易地区分未包含在训练序列中的微生物类群,表明也可以检测到新的微生物。