• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过深度学习方法限制和筛选 DNA 存储中的高度二级结构序列。

Limit and screen sequences with high degree of secondary structures in DNA storage by deep learning method.

机构信息

Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China.

Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China; School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, Guizhou, China; Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangzhou, Guangdong, China.

出版信息

Comput Biol Med. 2023 Nov;166:107548. doi: 10.1016/j.compbiomed.2023.107548. Epub 2023 Oct 2.

DOI:10.1016/j.compbiomed.2023.107548
PMID:37801922
Abstract

BACKGROUND

In single-stranded DNAs/RNAs, secondary structures are very common especially in long sequences. It has been recognized that the high degree of secondary structures in DNA sequences could interfere with the correct writing and reading of information in DNA storage. However, how to circumvent its side-effect is seldom studied.

METHOD

As the degree of secondary structures of DNA sequences is closely related to the magnitude of the free energy released in the complicated folding process, we first investigate the free-energy distribution at different encoding lengths based on randomly generated DNA sequences. Then, we construct a bidirectional long short-term (BiLSTM)-attention deep learning model to predict the free energy of sequences.

RESULTS

Our simulation results indicate that the free energy of DNA sequences at a specific length follows a right skewed distribution and the mean increases as the length increases. Given a tolerable free energy threshold of 20 kcal/mol, we could control the ratio of serious secondary structures in the encoding sequences to within 1% of the significant level through selecting a feasible encoding length of 100 nt. Compared with traditional deep learning models, the proposed model could achieve a better prediction performance both in the mean relative error (MRE) and the coefficient of determination (R). It achieved MRE = 0.109 and R = 0.918 respectively in the simulation experiment. The combination of the BiLSTM and attention module can handle the long-term dependencies and capture the feature of base pairing. Further, the prediction has a linear time complexity which is suitable for detecting sequences with severe secondary structures in future large-scale applications. Finally, 70 of 94 predicted free energy can be screened out on a real dataset. It demonstrates that the proposed model could screen out some highly suspicious sequences which are prone to produce more errors and low sequencing copies.

摘要

背景

在单链 DNA/RNA 中,二级结构非常常见,尤其是在长序列中。人们已经认识到,DNA 序列中的高度二级结构可能会干扰 DNA 存储中信息的正确书写和读取。然而,如何规避其副作用却很少被研究。

方法

由于 DNA 序列的二级结构程度与在复杂折叠过程中释放的自由能大小密切相关,我们首先根据随机生成的 DNA 序列研究不同编码长度的自由能分布。然后,我们构建了一个双向长短时记忆(BiLSTM)-注意力深度学习模型来预测序列的自由能。

结果

我们的模拟结果表明,特定长度的 DNA 序列的自由能遵循右偏分布,平均值随着长度的增加而增加。给定 20 kcal/mol 的可容忍自由能阈值,我们可以通过选择可行的 100 nt 编码长度将严重二级结构在编码序列中的比例控制在 1%的显著水平内。与传统的深度学习模型相比,所提出的模型在平均相对误差(MRE)和决定系数(R)方面都能实现更好的预测性能。在模拟实验中,它分别达到了 0.109 和 0.918 的 MRE 和 R。BiLSTM 和注意力模块的组合可以处理长期依赖性并捕获碱基配对的特征。此外,预测具有线性时间复杂度,适用于在未来的大规模应用中检测具有严重二级结构的序列。最后,在真实数据集上可以筛选出 94 个预测自由能中的 70 个。这表明所提出的模型可以筛选出一些容易产生更多错误和低测序副本的高度可疑序列。

相似文献

1
Limit and screen sequences with high degree of secondary structures in DNA storage by deep learning method.通过深度学习方法限制和筛选 DNA 存储中的高度二级结构序列。
Comput Biol Med. 2023 Nov;166:107548. doi: 10.1016/j.compbiomed.2023.107548. Epub 2023 Oct 2.
2
DEBFold: Computational Identification of RNA Secondary Structures for Sequences across Structural Families Using Deep Learning.DEBFold:使用深度学习对跨结构家族的序列进行 RNA 二级结构的计算识别。
J Chem Inf Model. 2024 May 13;64(9):3756-3766. doi: 10.1021/acs.jcim.4c00458. Epub 2024 Apr 22.
3
TurboFold: iterative probabilistic estimation of secondary structures for multiple RNA sequences.TurboFold:用于多个 RNA 序列的二级结构的迭代概率估计。
BMC Bioinformatics. 2011 Apr 20;12:108. doi: 10.1186/1471-2105-12-108.
4
Length-Dependent Deep Learning Model for RNA Secondary Structure Prediction.基于深度学习的 RNA 二级结构预测模型
Molecules. 2022 Feb 2;27(3):1030. doi: 10.3390/molecules27031030.
5
A New Method of RNA Secondary Structure Prediction Based on Convolutional Neural Network and Dynamic Programming.一种基于卷积神经网络和动态规划的RNA二级结构预测新方法。
Front Genet. 2019 May 22;10:467. doi: 10.3389/fgene.2019.00467. eCollection 2019.
6
An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences.基于氨基酸序列中上下文特征的 DNA 结合蛋白预测的改进深度学习方法。
PLoS One. 2019 Nov 14;14(11):e0225317. doi: 10.1371/journal.pone.0225317. eCollection 2019.
7
The contribution of DNA single-stranded order to the thermodynamics of duplex formation.DNA单链序列对双链形成热力学的贡献。
Proc Natl Acad Sci U S A. 1991 May 1;88(9):3569-73. doi: 10.1073/pnas.88.9.3569.
8
ILMCNet: A Deep Neural Network Model That Uses PLM to Process Features and Employs CRF to Predict Protein Secondary Structure.ILMCNet:一种利用 PLM 处理特征并采用 CRF 预测蛋白质二级结构的深度神经网络模型。
Genes (Basel). 2024 Oct 21;15(10):1350. doi: 10.3390/genes15101350.
9
circRNA-binding protein site prediction based on multi-view deep learning, subspace learning and multi-view classifier.基于多视图深度学习、子空间学习和多视图分类器的 circRNA 结合蛋白位点预测。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab394.
10
RNA independent fragment partition method based on deep learning for RNA secondary structure prediction.基于深度学习的 RNA 二级结构预测的 RNA 无依赖片段划分方法。
Sci Rep. 2023 Feb 17;13(1):2861. doi: 10.1038/s41598-023-30124-x.

引用本文的文献

1
Predict the degree of secondary structures of the encoding sequences in DNA storage by deep learning model.通过深度学习模型预测DNA存储中编码序列的二级结构程度。
Sci Rep. 2025 Jul 1;15(1):20920. doi: 10.1038/s41598-025-05717-3.
2
GTAT: empowering graph neural networks with cross attention.GTAT:通过交叉注意力增强图神经网络
Sci Rep. 2025 Feb 8;15(1):4760. doi: 10.1038/s41598-025-88993-3.
3
A generative adversarial network for multiple reads reconstruction in DNA storage.用于DNA存储中多序列读取重建的生成对抗网络
Sci Rep. 2024 Dec 30;14(1):32071. doi: 10.1038/s41598-024-83806-5.
4
A Deniable Encryption Method for Modulation-Based DNA Storage.基于调制的 DNA 存储的可否认加密方法。
Interdiscip Sci. 2024 Dec;16(4):872-881. doi: 10.1007/s12539-024-00648-5. Epub 2024 Aug 19.
5
Predict lncRNA-drug associations based on graph neural network.基于图神经网络预测长链非编码RNA-药物关联。
Front Genet. 2024 Apr 26;15:1388015. doi: 10.3389/fgene.2024.1388015. eCollection 2024.