Eremeyeva M, Din Y, Shirokii N, Serov N
International Institute "Solution Chemistry of Advanced Materials and Technologies", ITMO University, Saint-Petersburg, Russian Federation, 191002.
BMC Bioinformatics. 2025 Jan 6;26(1):2. doi: 10.1186/s12859-024-06019-7.
Deoxyribozymes or DNAzymes represent artificial short DNA sequences bearing many catalytic properties. In particular, DNAzymes able to cleave RNA sequences have a huge potential in gene therapy and sequence-specific analytic detection of disease markers. This activity is provided by catalytic cores able to perform site-specific hydrolysis of the phosphodiester bond of an RNA substrate. However, the vast majority of existing DNAzyme catalytic cores have low efficacy in in vivo experiments, whereas SELEX based on in vitro screening offers long and expensive selection cycle with the average success rate of ~ 30%, moreover not allowing the direct selection of chemically modified DNAzymes, which were previously shown to demonstrate higher activity in vivo. Therefore, there is a huge need in in silico approach for exploratory analysis of RNA-cleaving DNAzyme cores to drastically ease the discovery of novel catalytic cores with superior activities.
In this work, we develop a machine learning based open-source platform SequenceCraft allowing experimental scientists to perform DNAzyme exploratory analysis via quantitative observed rate constant (k) estimation as well as statistical and clustering data analysis. This became possible with the development of a unique curated database of > 350 RNA-cleaving catalytic cores, property-based sequence representations allowing to work with both conventional and chemically modified nucleotides, and optimized k predicting algorithm achieving Q > 0.9 on experimental data published to date.
This work represents a significant advancement in DNAzyme research, providing a tool for more efficient discovery of RNA-cleaving DNAzymes. The SequenceCraft platform offers an in silico alternative to traditional experimental approaches, potentially accelerating the development of DNAzymes.
脱氧核酶或DNA酶是具有多种催化特性的人工短DNA序列。特别是,能够切割RNA序列的DNA酶在基因治疗和疾病标志物的序列特异性分析检测中具有巨大潜力。这种活性由能够对RNA底物的磷酸二酯键进行位点特异性水解的催化核心提供。然而,绝大多数现有的DNA酶催化核心在体内实验中的效率较低,而基于体外筛选的SELEX提供了漫长且昂贵的筛选周期,平均成功率约为30%,此外还不允许直接筛选化学修饰的DNA酶,而此前已证明化学修饰的DNA酶在体内表现出更高的活性。因此,迫切需要一种计算机方法来对切割RNA的DNA酶核心进行探索性分析,以极大地简化具有卓越活性的新型催化核心的发现过程。
在这项工作中,我们开发了一个基于机器学习的开源平台SequenceCraft,使实验科学家能够通过定量观察速率常数(k)估计以及统计和聚类数据分析来进行DNA酶探索性分析。这之所以成为可能,是因为开发了一个独特的、精心策划的数据库,其中包含超过350个切割RNA的催化核心,基于属性的序列表示允许处理常规和化学修饰的核苷酸,以及优化的k预测算法,该算法在迄今发表的实验数据上实现了Q>0.9。
这项工作代表了DNA酶研究的一项重大进展,为更高效地发现切割RNA的DNA酶提供了一种工具。SequenceCraft平台为传统实验方法提供了一种计算机替代方案,有可能加速DNA酶的开发。