Derinkok Yildiz, Wang Haiqi, Tjaden Brian
Department of Computer Science, Wellesley College, Wellesley, MA 02481, United States.
NAR Genom Bioinform. 2025 May 8;7(2):lqaf055. doi: 10.1093/nargab/lqaf055. eCollection 2025 Jun.
Small regulatory RNAs (sRNAs) are widespread in bacteria. However, characterizing the targets of sRNA regulation in a way that scales with the increasing number of identified sRNAs has proven challenging. Computational methods offer one means for efficient characterization of sRNA targets, but the sensitivity and precision of such computational methods is limited. Here, we investigate whether publicly available expression data from RNA-seq experiments can improve the accuracy of computational prediction of sRNA regulatory targets. Using compendia of 2143 RNA-seq samples and 177 RNA-seq samples, we identify groups of co-expressed genes in each organism and incorporate this expression information into computational prediction of sRNA targets based on machine learning methods. We find that integrating expression information significantly improves the accuracy of computational results. Further, we observe that computational methods perform better when trained on smaller, higher quality sets of targets rather than on larger, noisier sets of targets identified by high-throughput methods.
小调控RNA(sRNA)在细菌中广泛存在。然而,以一种能随着已鉴定sRNA数量增加而扩展的方式来表征sRNA调控的靶标已被证明具有挑战性。计算方法提供了一种有效表征sRNA靶标的手段,但此类计算方法的灵敏度和精确度有限。在这里,我们研究来自RNA测序实验的公开可用表达数据是否能提高sRNA调控靶标计算预测的准确性。利用2143个RNA测序样本和177个RNA测序样本的数据集,我们在每个生物体中识别出共表达基因的群组,并将这种表达信息纳入基于机器学习方法的sRNA靶标计算预测中。我们发现整合表达信息能显著提高计算结果的准确性。此外,我们观察到,当在较小、质量较高的靶标集上进行训练时,计算方法的表现要优于在通过高通量方法鉴定出的较大、噪声较多的靶标集上进行训练时的表现。