利用无监督机器学习实现核酸适配体的多样化设计

Diversifying Design of Nucleic Acid Aptamers Using Unsupervised Machine Learning.

作者信息

Moussa Siba, Kilgour Michael, Jans Clara, Hernandez-Garcia Alex, Cuperlovic-Culf Miroslava, Bengio Yoshua, Simine Lena

机构信息

Department of Chemistry, McGill University, 801 Sherbrooke Street West, Montreal, QuebecH3A 0B8, Canada.

Montreal Institute for Learning Algorithms, 6666 St. Urbain, #200, Montreal, QuebecH2S 3H1, Canada.

出版信息

J Phys Chem B. 2023 Jan 12;127(1):62-68. doi: 10.1021/acs.jpcb.2c05660. Epub 2022 Dec 27.

DOI:10.1021/acs.jpcb.2c05660

PMID:36574492

Abstract

Inverse design of short single-stranded RNA and DNA sequences (aptamers) is the task of finding sequences that satisfy a set of desired criteria. Relevant criteria may be, for example, the presence of specific folding motifs, binding to molecular ligands, sensing properties, and so on. Most practical approaches to aptamer design identify a small set of promising candidate sequences using high-throughput experiments (e.g., SELEX) and then optimize performance by introducing only minor modifications to the empirically found candidates. Sequences that possess the desired properties but differ drastically in chemical composition will add diversity to the search space and facilitate the discovery of useful nucleic acid aptamers. Systematic diversification protocols are needed. Here we propose to use an unsupervised machine learning model known as the Potts model to discover new, useful sequences with controllable sequence diversity. We start by training a Potts model using the maximum entropy principle on a small set of empirically identified sequences unified by a common feature. To generate new candidate sequences with a controllable degree of diversity, we take advantage of the model's spectral feature: an "energy" bandgap separating sequences that are similar to the training set from those that are distinct. By controlling the Potts energy range that is sampled, we generate sequences that are distinct from the training set yet still likely to have the encoded features. To demonstrate performance, we apply our approach to design diverse pools of sequences with specified secondary structure motifs in 30-mer RNA and DNA aptamers.

摘要

短单链RNA和DNA序列（适体）的反向设计是寻找满足一组期望标准的序列的任务。相关标准例如可以是特定折叠基序的存在、与分子配体的结合、传感特性等等。适体设计的大多数实际方法使用高通量实验（例如SELEX）识别一小部分有前景的候选序列，然后通过仅对凭经验找到的候选序列进行微小修改来优化性能。具有所需特性但化学成分差异很大的序列将增加搜索空间的多样性，并有助于发现有用的核酸适体。需要系统的多样化方案。在这里，我们建议使用一种称为Potts模型的无监督机器学习模型来发现具有可控序列多样性的新的有用序列。我们首先使用最大熵原理在一小组由共同特征统一的凭经验识别的序列上训练Potts模型。为了生成具有可控多样性程度的新候选序列，我们利用模型的光谱特征：一个“能量”带隙，将与训练集相似的序列与不同的序列分开。通过控制采样的Potts能量范围，我们生成与训练集不同但仍可能具有编码特征的序列。为了证明性能，我们将我们的方法应用于设计30聚体RNA和DNA适体中具有指定二级结构基序的不同序列池。

相似文献

Diversifying Design of Nucleic Acid Aptamers Using Unsupervised Machine Learning.

J Phys Chem B. 2023 Jan 12;127(1):62-68. doi: 10.1021/acs.jpcb.2c05660. Epub 2022 Dec 27.

Improving aptamer performance with nucleic acid mimics: de novo and post-SELEX approaches.

Trends Biotechnol. 2022 May;40(5):549-563. doi: 10.1016/j.tibtech.2021.09.011. Epub 2021 Oct 28.

Constructive Prediction of Potential RNA Aptamers for a Protein Target.

IEEE/ACM Trans Comput Biol Bioinform. 2020 Sep-Oct;17(5):1476-1482. doi: 10.1109/TCBB.2019.2951114. Epub 2019 Nov 4.

RaptRanker: in silico RNA aptamer selection from HT-SELEX experiment based on local sequence and structure information.

Nucleic Acids Res. 2020 Aug 20;48(14):e82. doi: 10.1093/nar/gkaa484.

[Efficient screening for 8-oxoguanine DNA glycosylase binding aptamers via capillary electrophoresis].

Se Pu. 2021 Jul 8;39(7):721-729. doi: 10.3724/SP.J.1123.2020.12017.

Searching the Sequence Space for Potent Aptamers Using SELEX in Silico.

J Chem Theory Comput. 2015 Dec 8;11(12):5939-46. doi: 10.1021/acs.jctc.5b00707. Epub 2015 Nov 5.

An improved SELEX technique for selection of DNA aptamers binding to M-type 11 of Streptococcus pyogenes.

Methods. 2016 Mar 15;97:51-7. doi: 10.1016/j.ymeth.2015.12.005. Epub 2015 Dec 8.

In silico approaches to RNA aptamer design.

Biochimie. 2018 Feb;145:8-14. doi: 10.1016/j.biochi.2017.10.005. Epub 2017 Oct 12.

Characterisation of aptamer-target interactions by branched selection and high-throughput sequencing of SELEX pools.

Nucleic Acids Res. 2015 Dec 2;43(21):e139. doi: 10.1093/nar/gkv700. Epub 2015 Jul 10.

Chemical Modifications for a Next Generation of Nucleic Acid Aptamers.

Chembiochem. 2022 Aug 3;23(15):e202200006. doi: 10.1002/cbic.202200006. Epub 2022 Apr 29.

引用本文的文献

Advances in Protein-RNA aptamer recognition and modeling: Current trends and future perspectives.

Curr Opin Struct Biol. 2025 Aug 14;94:103133. doi: 10.1016/j.sbi.2025.103133.

Aptamer Sequence Optimization and Its Application in Food Safety Analysis.

Foods. 2025 Jul 26;14(15):2622. doi: 10.3390/foods14152622.

In vitro selection of aptamers and their applications.

Nat Rev Methods Primers. 2023;3. doi: 10.1038/s43586-023-00247-6. Epub 2023 Jul 20.

Statistical Analysis and Tokenization of Epitopes to Construct Artificial Neoepitope Libraries.

ACS Synth Biol. 2023 Oct 20;12(10):2812-2818. doi: 10.1021/acssynbio.3c00201. Epub 2023 Sep 13.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用无监督机器学习实现核酸适配体的多样化设计

Diversifying Design of Nucleic Acid Aptamers Using Unsupervised Machine Learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献