Climaco Paolo, Mitchell Noelle M, Tyler Matthew J, Yang Kyungae, Andrews Anne M, Bertozzi Andrea L
Institut für Numerische Simulation, University of Bonn, Bonn, 53115, NRW, Germany.
Department of Chemistry and Biochemistry, Los Angeles, 90095, CA, USA.
Math Biosci. 2025 Sep;387:109485. doi: 10.1016/j.mbs.2025.109485. Epub 2025 Jun 27.
Aptamers are oligonucleotide receptors that bind to their targets with high affinity. Here, we consider aptamers comprised of single-stranded DNA that undergo target-binding-induced conformational changes, giving rise to unique secondary and tertiary structures. Given a specific aptamer primary sequence, there are well-established computational tools (notably mfold) to predict the secondary structure via free energy minimization algorithms. While mfold generates secondary structures for individual sequences, there is a need for a high-throughput process whereby thousands of DNA structures can be predicted in real-time for use in an interactive setting, when combined with aptamer selections that generate candidate pools that are too large to be experimentally interrogated. We developed a new Python code for high-throughput aptamer secondary structure determination (GMfold). GMfold uses subgraph matching methods to group aptamer candidates by secondary structure similarities. We also improve an open-source code, SeqFold, to incorporate subgraph matching concepts. We represent each secondary structure as a lowest-energy bipartite subgraph matching of the DNA graph to itself. These new tools enable thousands of DNA sequences to be compared based on their secondary structures, using machine-learning algorithms. This process is advantageous when analyzing sequences that arise from aptamer selections via systematic evolution of ligands by exponential enrichment (SELEX). This work is a building block for future machine-learning-informed DNA-aptamer selection processes to identify aptamers with improved target affinity and selectivity and advance aptamer biosensors and therapeutics.
适体是与靶标具有高亲和力结合的寡核苷酸受体。在这里,我们考虑由单链DNA组成的适体,其会经历靶标结合诱导的构象变化,从而产生独特的二级和三级结构。给定特定的适体一级序列,有成熟的计算工具(特别是mfold)通过自由能最小化算法预测二级结构。虽然mfold可以生成单个序列的二级结构,但需要一个高通量过程,以便在与产生太大而无法通过实验研究的候选库的适体筛选相结合时,能够实时预测数千个DNA结构,用于交互式环境。我们开发了一种用于高通量适体二级结构测定的新Python代码(GMfold)。GMfold使用子图匹配方法按二级结构相似性对适体候选物进行分组。我们还改进了一个开源代码SeqFold,以纳入子图匹配概念。我们将每个二级结构表示为DNA图与其自身的最低能量二分图匹配。这些新工具能够使用机器学习算法基于二级结构对数千个DNA序列进行比较。在分析通过指数富集的配体系统进化(SELEX)进行适体筛选产生的序列时,这个过程是有利的。这项工作是未来基于机器学习的DNA适体筛选过程的基石,以识别具有更高靶标亲和力和选择性的适体,并推进适体生物传感器和治疗方法的发展。