Zhang Xiaoyu
bioRxiv. 2024 May 5:2024.05.02.592266. doi: 10.1101/2024.05.02.592266.
RNA sequencing (RNA-seq) has become a cornerstone in transcriptomics, offering detailed insights into gene expression across diverse biological conditions and sample types. However, RNA-seq data often suffer from batch effects, which are systematic non-biological differences that compromise data reliability and obscure true biological variation. To address these challenges, we introduce ComBat-ref, a refined method of batch effect correction that enhances the statistical power and reliability of differential expression analysis in RNA-seq data. Building on the foundations of ComBat-seq, ComBat-ref employs a negative binomial model to adjust count data but innovates by using a pooled dispersion parameter for entire batches and preserving count data for the reference batch. Our method demonstrated superior performance in both simulated environments and real datasets, such as the growth factor receptor network (GFRN) data and NASA GeneLab transcriptomic datasets, significantly improving sensitivity and specificity over existing methods. By effectively mitigating batch effects while maintaining high detection power, ComBat-ref proves to be a robust tool for enhancing the accuracy and interpretability of RNA-seq data analyses.
RNA测序(RNA-seq)已成为转录组学的基石,能深入洞察不同生物条件和样本类型下的基因表达情况。然而,RNA-seq数据常常受到批次效应的影响,这些效应是系统性的非生物学差异,会损害数据可靠性并掩盖真实的生物学变异。为应对这些挑战,我们引入了ComBat-ref,这是一种经过改进的批次效应校正方法,可增强RNA-seq数据中差异表达分析的统计功效和可靠性。基于ComBat-seq的基础,ComBat-ref采用负二项式模型来调整计数数据,但通过为整个批次使用合并的离散参数并保留参考批次的计数数据进行创新。我们的方法在模拟环境和真实数据集(如生长因子受体网络(GFRN)数据和美国国家航空航天局基因实验室转录组数据集)中均表现出卓越性能,与现有方法相比,显著提高了灵敏度和特异性。通过有效减轻批次效应同时保持高检测能力,ComBat-ref被证明是增强RNA-seq数据分析准确性和可解释性的强大工具。