Du Haigui, Zhou Shihua, Yan WeiQi, Wang Sijie
Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China.
School of Engineering, Computer and Mathematical Sciences, Auckland University of Technology, Auckland 1010, New Zealand.
Curr Issues Mol Biol. 2023 Apr 18;45(4):3573-3590. doi: 10.3390/cimb45040233.
With the informationization of social processes, the amount of related data has greatly increased, making traditional storage media unable to meet the current requirements for data storage. Due to its advantages of a high storage capacity and persistence, deoxyribonucleic acid (DNA) has been considered the most prospective storage media to solve the data storage problem. Synthesis is an important process for DNA storage, and low-quality DNA coding can increase errors during sequencing, which can affect the storage efficiency. To reduce errors caused by the poor stability of DNA sequences during storage, this paper proposes a method that uses the double-matching and error-pairing constraints to improve the quality of the DNA coding set. First, the double-matching and error-pairing constraints are defined to solve problems of sequences with self-complementary reactions in the solution that are prone to mismatch at the 3' end. In addition, two strategies are introduced in the arithmetic optimization algorithm, including a random perturbation of the elementary function and a double adaptive weighting strategy. An improved arithmetic optimization algorithm (IAOA) is proposed to construct DNA coding sets. The experimental results of the IAOA on 13 benchmark functions show a significant improvement in its exploration and development capabilities over the existing algorithms. Moreover, the IAOA is used in the DNA encoding design under both traditional and new constraints. The DNA coding sets are tested to estimate their quality regarding the number of hairpins and melting temperature. The DNA storage coding sets constructed in this study are improved by 77.7% at the lower boundary compared to existing algorithms. The DNA sequences in the storage sets show a reduction of 9.7-84.1% in the melting temperature variance, and the hairpin structure ratio is reduced by 2.1-80%. The results indicate that the stability of the DNA coding sets is improved under the two proposed constraints compared to traditional constraints.
随着社会进程的信息化,相关数据量大幅增加,使得传统存储介质无法满足当前的数据存储需求。由于脱氧核糖核酸(DNA)具有高存储容量和持久性的优点,它被认为是解决数据存储问题最具前景的存储介质。合成是DNA存储的一个重要过程,低质量的DNA编码会增加测序过程中的错误,进而影响存储效率。为减少DNA序列在存储过程中因稳定性差而导致的错误,本文提出一种利用双匹配和错误配对约束来提高DNA编码集质量的方法。首先,定义双匹配和错误配对约束以解决溶液中易发生自互补反应且在3'端容易错配的序列问题。此外,在算术优化算法中引入两种策略,包括基本函数的随机扰动和双自适应加权策略。提出一种改进的算术优化算法(IAOA)来构建DNA编码集。IAOA在13个基准函数上的实验结果表明,其探索和开发能力比现有算法有显著提高。此外,IAOA被用于传统约束和新约束下的DNA编码设计。对DNA编码集进行测试以评估其在发夹数量和熔解温度方面的质量。与现有算法相比,本研究构建的DNA存储编码集在下边界处提高了77.7%。存储集中的DNA序列熔解温度方差降低了9.7 - 84.1%,发夹结构比例降低了2.1 - 80%。结果表明,与传统约束相比,在提出的两种约束下DNA编码集的稳定性得到了提高。