Ashlock Daniel, Houghten Sheridan K, Brown Joseph Alexander, Orth John
Department of Mathematics and Statistics, University of Guelph, Guelph, Ontario, Canada.
Biosystems. 2012 Oct;110(1):1-8. doi: 10.1016/j.biosystems.2012.06.005. Epub 2012 Jul 6.
DNA error correcting codes over the edit metric consist of embeddable markers for sequencing projects that are tolerant of sequencing errors. When a genetic library has multiple sources for its sequences, use of embedded markers permit tracking of sequence origin. This study compares different methods for synthesizing DNA error correcting codes. A new code-finding technique called the salmon algorithm is introduced and used to improve the size of best known codes in five difficult cases of the problem, including the most studied case: length six, distance three codes. An updated table of the best known code sizes with 36 improved values, resulting from three different algorithms, is presented. Mathematical background results for the problem from multiple sources are summarized. A discussion of practical details that arise in application, including biological design and decoding, is also given in this study.
基于编辑度量的DNA纠错码由适用于容忍测序错误的测序项目的可嵌入标记组成。当一个基因文库的序列有多个来源时,使用嵌入标记可以追踪序列来源。本研究比较了合成DNA纠错码的不同方法。引入了一种名为鲑鱼算法的新代码查找技术,并用于在该问题的五个困难案例中改进已知最佳代码的大小,包括研究最多的案例:长度为六、距离为三的代码。给出了一个更新后的已知最佳代码大小表,其中有36个改进值,这些值来自三种不同的算法。总结了来自多个来源的该问题的数学背景结果。本研究还讨论了应用中出现的实际细节,包括生物学设计和解码。