Zhang Anxun, Wang Longjie, Zhai Xiaowei, Xiao Yao, Wu Yanchan, Zhao Yongxi, Liu Kai, Zheng Ji-Shen, Chen Dong
Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, 310003, P. R. China.
College of Energy Engineering and State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, Zhejiang, 310003, P. R. China.
Adv Sci (Weinh). 2025 Jul;12(27):e2503790. doi: 10.1002/advs.202503790. Epub 2025 Apr 26.
Peptides are natural information-bearing mediums and are promising for high-density data storage. However, conventional mapping of one amino acid (AA) to one binary code has limited the improvement of coding density by increasing the total number of different AAs. Here, a novel composite mapping strategy is developed, where each position in the peptide sequence is a composite letter consisting of several different AAs, and thousands of composite letters are available for mapping, thus breaking the limit of conventional mapping. When 20 different AAs are used, the coding density of six-AAs composite mapping achieves 15 bits/letter, while conventional mapping only reaches 4 bits/AA. The whole process of encoding data into composite letter sequences, synthesizing composite letter sequences via solid-phase peptide synthesis, sequencing composite letter sequences by mass spectrometry, and decoding data from composite letter sequences is successfully demonstrated for the first time. Composite mapping also demonstrates several distinct advantages, including high coding density, few synthesis cycles, high reliability against errors, low probability of homopolymers, and good compatibility with other encoding algorithms. The developed composite mapping strategy provides a novel way for peptide-based data storage to increase the coding density and reduce the synthesis cycles, showing great potential for large-scale data storage.
肽是天然的信息承载介质,在高密度数据存储方面具有潜力。然而,传统的将一个氨基酸(AA)映射为一个二进制代码的方式,通过增加不同AA的总数来提高编码密度的能力有限。在此,开发了一种新颖的复合映射策略,其中肽序列中的每个位置是一个由几个不同AA组成的复合字母,并且有数千个复合字母可用于映射,从而突破了传统映射的限制。当使用20种不同的AA时,六氨基酸复合映射的编码密度达到15比特/字母,而传统映射仅达到4比特/AA。首次成功展示了将数据编码为复合字母序列、通过固相肽合成合成复合字母序列、通过质谱对复合字母序列进行测序以及从复合字母序列解码数据的全过程。复合映射还展示了几个明显的优点,包括高编码密度、较少的合成循环、高抗错误可靠性、低同聚物概率以及与其他编码算法的良好兼容性。所开发的复合映射策略为基于肽的数据存储提供了一种增加编码密度和减少合成循环的新方法,在大规模数据存储方面显示出巨大潜力。