School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, 211198, Nanjing, China.
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Central Ave, Hong Kong SAR, China.
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae563.
The clinical adoption of small interfering RNAs (siRNAs) has prompted the development of various computational strategies for siRNA design, from traditional data analysis to advanced machine learning techniques. However, previous studies have inadequately considered the full complexity of the siRNA silencing mechanism, neglecting critical elements such as siRNA positioning on mRNA, RNA base-pairing probabilities, and RNA-AGO2 interactions, thereby limiting the insight and accuracy of existing models. Here, we introduce siRNADiscovery, a Graph Neural Network (GNN) framework that leverages both non-empirical and empirical rule-based features of siRNA and mRNA to effectively capture the complex dynamics of gene silencing. On multiple internal datasets, siRNADiscovery achieves state-of-the-art performance. Significantly, siRNADiscovery also outperforms existing methodologies in in vitro studies and on an externally validated dataset. Additionally, we develop a new data-splitting methodology that addresses the data leakage issue, a frequently overlooked problem in previous studies, ensuring the robustness and stability of our model under various experimental settings. Through rigorous testing, siRNADiscovery has demonstrated remarkable predictive accuracy and robustness, making significant contributions to the field of gene silencing. Furthermore, our approach to redefining data-splitting standards aims to set new benchmarks for future research in the domain of predictive biological modeling for siRNA.
小干扰 RNA(siRNA)的临床应用促使人们开发了各种用于 siRNA 设计的计算策略,从传统数据分析到先进的机器学习技术。然而,以前的研究没有充分考虑到 siRNA 沉默机制的全部复杂性,忽略了 siRNA 在 mRNA 上的定位、RNA 碱基配对概率和 RNA-AGO2 相互作用等关键因素,从而限制了现有模型的洞察力和准确性。在这里,我们引入了 siRNADiscovery,这是一个图神经网络(GNN)框架,利用 siRNA 和 mRNA 的非经验和基于经验的规则特征,有效地捕捉基因沉默的复杂动态。在多个内部数据集上,siRNADiscovery 实现了最先进的性能。重要的是,siRNADiscovery 在体外研究和外部验证数据集上也优于现有的方法。此外,我们开发了一种新的数据分割方法,解决了数据泄露问题,这是以前研究中经常被忽视的问题,确保了我们的模型在各种实验设置下的稳健性和稳定性。通过严格的测试,siRNADiscovery 表现出了显著的预测准确性和稳健性,为基因沉默领域做出了重要贡献。此外,我们重新定义数据分割标准的方法旨在为预测生物模型在 siRNA 领域的未来研究设定新的基准。