Biopolymer Design LLC, Acton, Massachusetts, United States of America.
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America.
PLoS One. 2018 Jun 21;13(6):e0199162. doi: 10.1371/journal.pone.0199162. eCollection 2018.
Off-target oligoprobe's interaction with partially complementary nucleotide sequences represents a problem for many bio-techniques. The goal of the study was to identify oligoprobe sequence characteristics that control the ratio between on-target and off-target hybridization. To understand the complex interplay between specific and genome-wide off-target (cross-hybridization) signals, we analyzed a database derived from genomic comparison hybridization experiments performed with an Affymetrix tiling array. The database included two types of probes with signals derived from (i) a combination of specific signal and cross-hybridization and (ii) genomic cross-hybridization only. All probes from the database were grouped into bins according to their sequence characteristics, where both hybridization signals were averaged separately. For selection of specific probes, we analyzed the following sequence characteristics: vulnerability to self-folding, nucleotide composition bias, numbers of G nucleotides and GGG-blocks, and occurrence of probe's k-mers in the human genome. Increases in bin ranges for these characteristics are simultaneously accompanied by a decrease in hybridization specificity-the ratio between specific and cross-hybridization signals. However, both averaged hybridization signals exhibit growing trends along with an increase of probes' binding energy, where the hybridization specific signal increases significantly faster in comparison to the cross-hybridization. The same trend is evident for the S function, which serves as a combined evaluation of probe binding energy and occurrence of probe's k-mers in the genome. Application of S allows extracting a larger number of specific probes, as compared to using only binding energy. Thus, we showed that high values of specific and cross-hybridization signals are not mutually exclusive for probes with high values of binding energy and S. In this study, the application of a new set of sequence characteristics allows detection of probes that are highly specific to their targets for array design and other bio-techniques that require selection of specific probes.
非靶标寡探针与部分互补核苷酸序列的相互作用是许多生物技术的一个问题。本研究的目的是确定控制靶标与非靶标杂交比率的寡探针序列特征。为了理解特定和全基因组非靶标(交叉杂交)信号之间的复杂相互作用,我们分析了一个源自 Affymetrix 平铺阵列比较基因组杂交实验的数据库。该数据库包括两种类型的探针,其信号来源于(i)特异性信号和交叉杂交的组合和(ii)仅基因组交叉杂交。数据库中的所有探针根据其序列特征分为不同的箱,其中分别平均杂交信号。为了选择特异性探针,我们分析了以下序列特征:自我折叠的脆弱性、核苷酸组成偏倚、G 核苷酸和 GGG 块的数量,以及探针 k-mer 在人类基因组中的出现。这些特征的箱范围增加,同时伴随着杂交特异性的降低——特异性和交叉杂交信号的比率。然而,随着探针结合能的增加,两种平均杂交信号都呈现出增长趋势,其中特异性杂交信号的增长速度明显快于交叉杂交。S 函数也存在同样的趋势,它作为探针结合能和探针 k-mer 在基因组中出现的综合评估。与仅使用结合能相比,S 的应用允许提取更多的特异性探针。因此,我们表明,对于具有高结合能和 S 值的探针,特异性和交叉杂交信号的高值并不是相互排斥的。在本研究中,应用一组新的序列特征可以检测到针对其靶标高度特异性的探针,用于阵列设计和其他需要选择特异性探针的生物技术。