Nersisyan Stepan, Loher Phillipe, Rigoutsos Isidore
Computational Medicine Center, Thomas Jefferson University, Philadelphia, PA 19107, United States.
Nucleic Acids Res. 2025 May 22;53(10). doi: 10.1093/nar/gkaf444.
Correcting for confounding variables is often overlooked when computing RNA-RNA correlations, even though it can profoundly affect results. We introduce CorrAdjust, a method for identifying and correcting such hidden confounders. CorrAdjust selects a subset of principal components to residualize from expression data by maximizing the enrichment of "reference pairs" among highly correlated RNA-RNA pairs. Unlike traditional machine learning metrics, this novel enrichment-based metric is specifically designed to evaluate correlation data and provides valuable RNA-level interpretability. CorrAdjust outperforms current state-of-the-art methods when evaluated on 25 063 human RNA-seq datasets from The Cancer Genome Atlas, the Genotype-Tissue Expression project, and the Geuvadis collection. In particular, CorrAdjust excels at integrating small RNA and mRNA sequencing data, significantly enhancing the enrichment of experimentally validated miRNA targets among negatively correlated miRNA-mRNA pairs. CorrAdjust, with accompanying documentation and tutorials, is available at https://tju-cmc-org.github.io/CorrAdjust.
在计算RNA - RNA相关性时,校正混杂变量常常被忽视,尽管它会对结果产生深远影响。我们引入了CorrAdjust,这是一种用于识别和校正此类隐藏混杂因素的方法。CorrAdjust通过最大化高度相关的RNA - RNA对中“参考对”的富集度,从表达数据中选择要进行残差分析的主成分子集。与传统机器学习指标不同,这种基于新颖富集度的指标是专门为评估相关性数据而设计的,并提供了有价值的RNA水平的可解释性。在对来自癌症基因组图谱、基因型 - 组织表达项目和Geuvadis数据集的25063个人类RNA测序数据集进行评估时,CorrAdjust的表现优于当前最先进的方法。特别是,CorrAdjust在整合小RNA和mRNA测序数据方面表现出色,显著提高了实验验证的miRNA靶标在负相关miRNA - mRNA对中的富集度。可通过https://tju - cmc - org.github.io/CorrAdjust获取带有文档和教程的CorrAdjust。