Akay Ali, Reddy Hemaprakash Nanja, Galloway Roma, Kozyra Jerzy, Jackson Alexander W
Nanovery Limited, United Kingdom.
Universita Degli Studi di Trento, Italy.
Heliyon. 2024 Mar 21;10(7):e28443. doi: 10.1016/j.heliyon.2024.e28443. eCollection 2024 Apr 15.
Dynamic DNA nanotechnology is driving exciting developments in molecular computing, cargo delivery, sensing and detection. Combining this innovative area of research with the progress made in machine learning will aid in the design of sophisticated DNA machinery. Herein, we present a novel framework based on a transformer architecture and a deep learning model which can predict the rate constant of toehold-mediated strand displacement, the underlying process in dynamic DNA nanotechnology. Initially, a dataset of 4450 DNA sequences and corresponding rate constants were generated using KinDA. Subsequently, a 1D convolution neural network was trained using specific local features and DNA-BERT sequence embedding to produce predicted rate constants. As a result, the newly trained deep learning model predicted toehold-mediated strand displacement rate constants with a root mean square error of 0.76, during testing. These findings demonstrate that DNA-BERT can improve prediction accuracy, negating the need for extensive computational simulations or experimentation. Finally, the impact of various local features during model training is discussed, and a detailed comparison between the One-hot encoder and DNA-BERT sequences representation methods is presented.
动态DNA纳米技术正在推动分子计算、货物递送、传感与检测等领域令人兴奋的发展。将这一创新研究领域与机器学习所取得的进展相结合,将有助于设计复杂的DNA机器。在此,我们提出了一种基于Transformer架构和深度学习模型的新颖框架,该框架能够预测动态DNA纳米技术中的基础过程——引发链介导的链置换的速率常数。首先,使用KinDA生成了一个包含4450个DNA序列及相应速率常数的数据集。随后,利用特定的局部特征和DNA-BERT序列嵌入训练了一个一维卷积神经网络,以生成预测的速率常数。结果,新训练的深度学习模型在测试期间预测引发链介导的链置换速率常数时,均方根误差为0.76。这些发现表明,DNA-BERT可以提高预测准确性,无需进行大量的计算模拟或实验。最后,讨论了模型训练期间各种局部特征的影响,并对独热编码器和DNA-BERT序列表示方法进行了详细比较。