Böge Franz Leonard, Zacharias Helena U, Becker Stefanie C, Jung Klaus
Institute for Animal Genomics, University of Veterinary Medicine Hannover Foundation, Hannover, Germany.
Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover Medical School, Hannover, Germany.
Front Bioinform. 2025 Jul 8;5:1566162. doi: 10.3389/fbinf.2025.1566162. eCollection 2025.
Since the rise of molecular high-throughput technologies, many diseases are now studied on multiple omics layers in parallel. Understanding the interplay between microRNAs (miRNA) and their target mRNAs is important to understand the molecular level of diseases. While much public data from mRNA experiments are available for many diseases, few paired datasets with both miRNA and mRNA expression profiles are available. This study aimed to assess the possibility of predicting miRNA expression data based on mRNA expression data, serving as a proof of principle that such cross-omics predictions are feasible. Furthermore, current research relies on target databases where information about miRNA-target relationships is provided based on experimental and computational studies.
To make use of publicly available mRNA profiles, we investigate the ability of artificial deep neural networks and linear least absolute shrinkage and selection operator (LASSO) regression to predict unknown miRNA expression profiles. We evaluate the approach using seven paired miRNA/mRNA expression datasets, four from studies on West Nile virus infection in mouse tissues and three from human immunodeficiency virus (HIV) infection in human tissues. We assessed the performance of each model first by within-data evaluations and second by cross-study evaluations. Furthermore, we investigated whether data augmentation or separate models for data from diseased and non-diseased samples can improve the prediction performance.
In general, most settings achieved strong correlations at the Level of individual samples. In some datasets and settings, correlations of log-fold changes and p-values from differential expression analysis (DEA) between true and predicted miRNA profiles can be observed. Correlation between log fold changes could also be seen in a cross-study evaluation for the HIV datasets. Data augmentation consistently improved performance in neural networks, while its impact on LASSO models was not significant.
Overall, cross-omics prediction of expression profiles appears possible, even with some correlations on the Level of the differential expression analysis.
自分子高通量技术兴起以来,现在许多疾病都在多个组学层面上进行并行研究。了解微小RNA(miRNA)与其靶标mRNA之间的相互作用对于理解疾病的分子水平很重要。虽然许多疾病都有大量来自mRNA实验的公共数据,但很少有同时包含miRNA和mRNA表达谱的配对数据集。本研究旨在评估基于mRNA表达数据预测miRNA表达数据的可能性,以此作为这种跨组学预测可行的原理证明。此外,目前的研究依赖于靶标数据库,其中关于miRNA-靶标关系的信息是基于实验和计算研究提供的。
为了利用公开可用的mRNA谱,我们研究了人工深度神经网络和线性最小绝对收缩和选择算子(LASSO)回归预测未知miRNA表达谱的能力。我们使用七个配对的miRNA/mRNA表达数据集评估该方法,其中四个来自小鼠组织中西尼罗河病毒感染的研究,三个来自人类组织中人类免疫缺陷病毒(HIV)感染的研究。我们首先通过数据内评估,其次通过跨研究评估来评估每个模型的性能。此外,我们研究了数据增强或针对患病和未患病样本数据的单独模型是否可以提高预测性能。
总体而言,大多数情况下在个体样本水平上实现了强相关性。在一些数据集和情况下,可以观察到真实miRNA谱与预测miRNA谱之间差异表达分析(DEA)的对数倍数变化和p值的相关性。在HIV数据集的跨研究评估中也可以看到对数倍数变化之间的相关性。数据增强始终能提高神经网络的性能,而其对LASSO模型的影响不显著。
总体而言,即使在差异表达分析水平上存在一些相关性,表达谱的跨组学预测似乎也是可能的。