Anuarbekov Alikhan, Kléma Jiří
Department of Computer Science, Faculty of Electrical Engineering, Czech Technical University in Prague, Technicka 2, 16627, Prague, Czech Republic.
BMC Bioinformatics. 2025 May 27;26(1):139. doi: 10.1186/s12859-025-06161-w.
Current experimental data on RNA interactions remain limited, particularly for non-coding RNAs, many of which have only recently been discovered and operate within complex regulatory networks. Researchers often rely on in-silico interaction detection algorithms, such as TargetScan, which are based on biochemical sequence alignment. However, these algorithms have limited performance. RNA-seq expression data can provide valuable insights into regulatory networks, especially for understudied interactions such as circRNA-miRNA-mRNA. By integrating RNA-seq data with prior interaction networks obtained experimentally or through in-silico predictions, researchers can discover novel interactions, validate existing ones, and improve interaction prediction accuracy.
This paper introduces Pi-GMIFS, an extension of the generalized monotone incremental forward stagewise (GMIFS) regression algorithm that incorporates prior knowledge. The algorithm first estimates prior response values through a prior-only regression, interpolates between these prior values and the original data, and then applies the GMIFS method. Our experimental results on circRNA-miRNA-mRNA regulatory interaction networks demonstrate that Pi-GMIFS consistently enhances precision and recall in RNA interaction prediction by leveraging implicit information from bulk RNA-seq expression data, outperforming the initial prior knowledge.
Pi-GMIFS is a robust algorithm for inferring acyclic interaction networks when the variable ordering is known. Its effectiveness was confirmed through extensive experimental validation. We proved that RNA-seq data of a representative size help infer previously unknown interactions available in TarBase v9 and improve the quality of circRNA disease annotation.
目前关于RNA相互作用的实验数据仍然有限,特别是对于非编码RNA,其中许多是最近才发现的,并且在复杂的调控网络中发挥作用。研究人员通常依赖基于生化序列比对的计算机相互作用检测算法,如TargetScan。然而,这些算法的性能有限。RNA测序表达数据可以为调控网络提供有价值的见解,特别是对于研究较少的相互作用,如环状RNA-微小RNA-信使RNA。通过将RNA测序数据与通过实验获得或通过计算机预测得到的先前相互作用网络相结合,研究人员可以发现新的相互作用,验证现有的相互作用,并提高相互作用预测的准确性。
本文介绍了Pi-GMIFS,它是广义单调递增前向逐步(GMIFS)回归算法的扩展,纳入了先验知识。该算法首先通过仅基于先验的回归估计先验响应值,在这些先验值和原始数据之间进行插值,然后应用GMIFS方法。我们在环状RNA-微小RNA-信使RNA调控相互作用网络上的实验结果表明,Pi-GMIFS通过利用大量RNA测序表达数据中的隐含信息,在RNA相互作用预测中持续提高了精确率和召回率,优于初始的先验知识。
Pi-GMIFS是一种在变量顺序已知时推断无环相互作用网络的强大算法。其有效性通过广泛的实验验证得到了证实。我们证明,具有代表性规模的RNA测序数据有助于推断TarBase v9中可用的先前未知的相互作用,并提高环状RNA疾病注释的质量。