National Research Council of Canada, Institute of Information Technology, 100 des Aboiteaux Street, Suite 1100, Moncton, NB E1A7R1, Canada.
BMC Bioinformatics. 2010 Feb 24;11:105. doi: 10.1186/1471-2105-11-105.
Estimation of DNA duplex hybridization free energy is widely used for predicting cross-hybridizations in DNA computing and microarray experiments. A number of software programs based on different methods and parametrizations are available for the theoretical estimation of duplex free energies. However, significant differences in free energy values are sometimes observed among estimations obtained with various methods, thus being difficult to decide what value is the accurate one.
We present in this study a quantitative comparison of the similarities and differences among four published DNA/DNA duplex free energy calculation methods and an extended Nearest-Neighbour Model for perfect matches based on triplet interactions. The comparison was performed on a benchmark data set with 695 pairs of short oligos that we collected and manually curated from 29 publications. Sequence lengths range from 4 to 30 nucleotides and span a large GC-content percentage range. For perfect matches, we propose an extension of the Nearest-Neighbour Model that matches or exceeds the performance of the existing ones, both in terms of correlations and root mean squared errors. The proposed model was trained on experimental data with temperature, sodium and sequence concentration characteristics that span a wide range of values, thus conferring the model a higher power of generalization when used for free energy estimations of DNA duplexes under non-standard experimental conditions.
Based on our preliminary results, we conclude that no statistically significant differences exist among free energy approximations obtained with 4 publicly available and widely used programs, when benchmarked against a collection of 695 pairs of short oligos collected and curated by the authors of this work based on 29 publications. The extended Nearest-Neighbour Model based on triplet interactions presented in this work is capable of performing accurate estimations of free energies for perfect match duplexes under both standard and non-standard experimental conditions and may serve as a baseline for further developments in this area of research.
DNA 双链杂交自由能的估算广泛用于预测 DNA 计算和微阵列实验中的交叉杂交。有许多基于不同方法和参数化的软件程序可用于理论上估算双链自由能。然而,不同方法的估算值有时存在显著差异,因此很难确定哪个值是准确的。
我们在这项研究中对四种已发表的 DNA/DNA 双链自由能计算方法和基于三碱基相互作用的完美匹配扩展近邻模型进行了定量比较。比较是在一个基准数据集上进行的,该数据集包含 695 对我们从 29 篇文献中收集和手动整理的短寡核苷酸对。序列长度从 4 到 30 个核苷酸不等,跨越了很大的 GC 含量百分比范围。对于完美匹配,我们提出了近邻模型的扩展,在相关性和均方根误差方面都与现有的模型相匹配或超过。所提出的模型是在具有广泛值的温度、钠离子和序列浓度特性的实验数据上进行训练的,从而赋予模型在非标准实验条件下对 DNA 双链自由能进行估算时更高的泛化能力。
根据我们的初步结果,我们得出结论,在与作者根据 29 篇文献收集和整理的 695 对短寡核苷酸对进行基准测试时,四个公开的广泛使用的程序获得的自由能近似值之间没有统计学上的显著差异。在标准和非标准实验条件下,基于三碱基相互作用的扩展近邻模型能够对完美匹配双链的自由能进行准确估算,并可为该研究领域的进一步发展提供基准。