复合量子点纳米孔编码系统，用于快速便携的 DNA 数据读取，具有高 INDEL 纠错能力。

Composite Hedges Nanopores codec system for rapid and portable DNA data readout with high INDEL-Correction.

机构信息

School of Microelectronics, MOE Engineering Research Center of Integrated Circuits for Next Generation Communications, Southern University of Science and Technology, Shenzhen, China.

Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China.

出版信息

Nat Commun. 2024 Oct 30;15(1):9395. doi: 10.1038/s41467-024-53455-3.

DOI:10.1038/s41467-024-53455-3

PMID:39477940

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11525716/

Abstract

Reading digital information from highly dense but lightweight DNA medium nowadays relies on time-consuming next-generation sequencing. Nanopore sequencing holds the promise to overcome the efficiency problem, but high indel error rates lead to the requirement of large amount of high quality data for accurate readout. Here we introduce Composite Hedges Nanopores, capable of handling indel rates up to 15.9% and substitution rates up to 7.8%. The overall information density can be doubled from 0.59 to 1.17 by utilizing a degenerated eight-letter alphabet. We demonstrate that sequencing times of 20 and 120 minutes are sufficient for processing representative text and image files, respectively. Moreover, to achieve complete data recovery, it is estimated that text and image data require 4× and 8× physical redundancy of composite strands, respectively. Our codec system excels on both molecular design and equalized dictionary usage, laying a solid foundation approaching to real-time DNA data retrieval and encoding.

摘要

如今，从高密度但重量轻的 DNA 介质中读取数字信息依赖于耗时的下一代测序。纳米孔测序有望克服效率问题，但高插入缺失错误率导致需要大量高质量数据才能进行准确读取。在这里，我们介绍了复合 Hedge 纳米孔，其能够处理高达 15.9%的插入缺失率和高达 7.8%的取代率。通过使用退化的 8 字母字母表，整体信息密度可以从 0.59 增加到 1.17。我们证明，分别进行 20 分钟和 120 分钟的测序时间足以处理代表性的文本和图像文件。此外，为了实现完全的数据恢复，估计文本和图像数据分别需要复合链的 4×和 8×物理冗余。我们的编解码器系统在分子设计和均衡字典使用方面都表现出色，为接近实时的 DNA 数据检索和编码奠定了坚实的基础。