Department of Biology, New York University, 100 Washington Square East, Rm 1009, New York, NY 10003-6688, USA.
IEEE/ACM Trans Comput Biol Bioinform. 2011 Nov-Dec;8(6):1642-52. doi: 10.1109/TCBB.2011.39.
For designing oligonucleotide tiling arrays popular, current methods still rely on simple criteria like Hamming distance or longest common factors, neglecting base stacking effects which strongly contribute to binding energies. Consequently, probes are often prone to cross-hybridization which reduces the signal-to-noise ratio and complicates downstream analysis. We propose the first computationally efficient method using hybridization energy to identify specific oligonucleotide probes. Our Cross-Hybridization Potential (CHP) is computed with a Nearest Neighbor Alignment, which efficiently estimates a lower bound for the Gibbs free energy of the duplex formed by two DNA sequences of bounded length. It is derived from our simplified reformulation of t-gap insertion-deletion-like metrics. The computations are accelerated by a filter using weighted ungapped q-grams to arrive at seeds. The computation of the CHP is implemented in our software OSProbes, available under the GPL, which computes sets of viable probe candidates. The user can choose a trade-off between running time and quality of probes selected. We obtain very favorable results in comparison with prior approaches with respect to specificity and sensitivity for cross-hybridization and genome coverage with high-specificity probes. The combination of OSProbes and our Tileomatic method, which computes optimal tiling paths from candidate sets, yields globally optimal tiling arrays, balancing probe distance, hybridization conditions, and uniqueness of hybridization.
为了设计寡核苷酸平铺阵列,目前的方法仍然依赖于简单的标准,如汉明距离或最长公共因子,而忽略了碱基堆积效应对结合能的强烈贡献。因此,探针往往容易发生交叉杂交,从而降低了信噪比,并使下游分析复杂化。我们提出了第一个使用杂交能来识别特定寡核苷酸探针的计算效率方法。我们的交叉杂交潜力(CHP)是通过最近邻比对计算的,该比对有效地估计了两个长度受限的 DNA 序列形成的双链体的吉布斯自由能的下限。它源自我们对 t-缺口插入-缺失类似度量的简化重构。通过使用加权无间隙 q-grams 的过滤器来加速计算种子。CHP 的计算在我们的软件 OSProbes 中实现,该软件可在 GPL 下使用,用于计算可行探针候选集。用户可以在运行时间和所选探针的质量之间进行权衡。与之前的方法相比,我们在交叉杂交和具有高特异性探针的基因组覆盖方面的特异性和敏感性方面获得了非常有利的结果。OSProbes 和我们的 Tileomatic 方法的结合,该方法从候选集中计算最优的平铺路径,产生了全局最优的平铺阵列,平衡了探针距离、杂交条件和杂交的独特性。