Suppr超能文献

小干扰RNA特征——用于治疗性脱靶数据的3D分子指纹和结构的自动机器学习

siRNA Features-Automated Machine Learning of 3D Molecular Fingerprints and Structures for Therapeutic Off-Target Data.

作者信息

Richter Michael, Admasu Alem

机构信息

Department of Chemistry, Binghamton University, Binghamton, NY 13902, USA.

Department of Physics and Astronomy, Rutgers University, Piscataway, NJ 08854, USA.

出版信息

Int J Mol Sci. 2025 Jul 16;26(14):6795. doi: 10.3390/ijms26146795.

Abstract

Chemical modifications are the standard for small interfering RNAs (siRNAs) in therapeutic applications, but predicting their off-target effects remains a significant challenge. Current approaches often rely on sequence-based encodings, which fail to fully capture the structural and protein-RNA interaction details critical for off-target prediction. In this study, we developed a framework to generate reproducible structure-based chemical features, incorporating both molecular fingerprints and computationally derived siRNA-hAgo2 complex structures. Using an RNA-Seq off-target study, we generated over 30,000 siRNA-gene data points and systematically compared nine distinct types of feature representation strategies. Among the datasets, the highest predictive performance was achieved by Dataset 3, which used extended connectivity fingerprints (ECFPs) to encode siRNA and mRNA features. An energy-minimized dataset (7R), representing siRNA-hAgo2 structural alignments, was the second-best performer, underscoring the value of incorporating reproducible structural information into feature engineering. Our findings demonstrate that combining detailed structural representations with sequence-based features enables the generation of robust, reproducible chemical features for machine learning models, offering a promising path forward for off-target prediction and siRNA therapeutic design that can be seamlessly extended to include any modification, such as clinically relevant 2'-F or 2'-OMe.

摘要

化学修饰是治疗应用中小干扰RNA(siRNA)的标准做法,但预测其脱靶效应仍然是一项重大挑战。当前的方法通常依赖基于序列的编码,而这种编码无法充分捕捉对脱靶预测至关重要的结构和蛋白质-RNA相互作用细节。在本研究中,我们开发了一个框架,以生成可重复的基于结构的化学特征,同时纳入分子指纹和通过计算得出的siRNA-hAgo2复合物结构。通过一项RNA测序脱靶研究,我们生成了超过30000个siRNA-基因数据点,并系统地比较了九种不同类型的特征表示策略。在这些数据集中,数据集3实现了最高的预测性能,该数据集使用扩展连接指纹(ECFP)来编码siRNA和mRNA特征。一个代表siRNA-hAgo2结构比对的能量最小化数据集(7R)是第二好的表现者,这突出了将可重复的结构信息纳入特征工程的价值。我们的研究结果表明,将详细的结构表示与基于序列的特征相结合,能够为机器学习模型生成强大、可重复的化学特征,为脱靶预测和siRNA治疗设计提供了一条有前景的道路,并且可以无缝扩展到包括任何修饰,如临床相关的2'-F或2'-OMe。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验