Volkel Kevin D, Lin Kevin N, Hook Paul W, Timp Winston, Keung Albert J, Tuck James M
Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC, 27606, United States.
Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, 27695, United States.
Bioinformatics. 2023 Oct 3;39(10). doi: 10.1093/bioinformatics/btad572.
DNA-based data storage is a quickly growing field that hopes to harness the massive theoretical information density of DNA molecules to produce a competitive next-generation storage medium suitable for archival data. In recent years, many DNA-based storage system designs have been proposed. Given that no common infrastructure exists for simulating these storage systems, comparing many different designs along with many different error models is increasingly difficult. To address this challenge, we introduce FrameD, a simulation infrastructure for DNA storage systems that leverages the underlying modularity of DNA storage system designs to provide a framework to express different designs while being able to reuse common components.
We demonstrate the utility of FrameD and the need for a common simulation platform using a case study. Our case study compares designs that utilize strand copies differently, some that align strand copies using multiple sequence alignment algorithms and others that do not. We found that the choice to include multiple sequence alignment in the pipeline is dependent on the error rate and the type of errors being injected and is not always beneficial. In addition to supporting a wide range of designs, FrameD provides the user with transparent parallelism to deal with a large number of reads from sequencing and the need for many fault injection iterations. We believe that FrameD fills a void in the tools publicly available to the DNA storage community by providing a modular and extensible framework with support for massive parallelism. As a result, it will help accelerate the design process of future DNA-based storage systems.
The source code for FrameD along with the data generated during the demonstration of FrameD is available in a public Github repository at https://github.com/dna-storage/framed, (https://dx.doi.org/10.5281/zenodo.7757762).
基于DNA的数据存储是一个快速发展的领域,希望利用DNA分子巨大的理论信息密度来生产一种适用于存档数据的具有竞争力的下一代存储介质。近年来,已经提出了许多基于DNA的存储系统设计。鉴于不存在用于模拟这些存储系统的通用基础设施,比较许多不同的设计以及许多不同的错误模型变得越来越困难。为了应对这一挑战,我们引入了FrameD,这是一种用于DNA存储系统的模拟基础设施,它利用DNA存储系统设计的底层模块化来提供一个框架,以表达不同的设计,同时能够重用通用组件。
我们通过一个案例研究展示了FrameD的实用性以及对通用模拟平台的需求。我们的案例研究比较了以不同方式利用链拷贝的设计,一些设计使用多序列比对算法来对齐链拷贝,而另一些则不使用。我们发现,在流程中包含多序列比对的选择取决于错误率和注入错误的类型,并不总是有益的。除了支持广泛的设计外,FrameD还为用户提供了透明的并行性,以处理来自测序的大量读取以及许多故障注入迭代的需求。我们相信,FrameD通过提供一个支持大规模并行性的模块化和可扩展框架,填补了DNA存储社区公开可用工具中的空白。因此,它将有助于加速未来基于DNA的存储系统的设计过程。
FrameD的源代码以及在FrameD演示期间生成的数据可在公共Github存储库中获取,网址为https://github.com/dna-storage/framed,(https://dx.doi.org/10.5281/zenodo.7757762)。