School of Life Sciences, Gwangju Institute of Science and Technology, 123 Cheomdan-gwagiro, 61005, Gwangju, South Korea.
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae241.
The normalization of RNA sequencing data is a primary step for downstream analysis. The most popular method used for the normalization is the trimmed mean of M values (TMM) and DESeq. The TMM tries to trim away extreme log fold changes of the data to normalize the raw read counts based on the remaining non-deferentially expressed genes. However, the major problem with the TMM is that the values of trimming factor M are heuristic. This paper tries to estimate the adaptive value of M in TMM based on Jaeckel's Estimator, and each sample acts as a reference to find the scale factor of each sample. The presented approach is validated on SEQC, MAQC2, MAQC3, PICKRELL and two simulated datasets with two-group and three-group conditions by varying the percentage of differential expression and the number of replicates. The performance of the present approach is compared with various state-of-the-art methods, and it is better in terms of area under the receiver operating characteristic curve and differential expression.
RNA 测序数据的标准化是下游分析的首要步骤。最常用的标准化方法是 trimmed mean of M values (TMM) 和 DESeq。TMM 试图通过剔除数据的极端对数倍数变化来标准化原始读取计数,基于剩余的非差异表达基因。然而,TMM 的主要问题是修剪因子 M 的值是启发式的。本文试图根据 Jaeckel 的估计器来估计 TMM 中的 M 的适应性值,每个样本作为一个参考来找到每个样本的比例因子。该方法在 SEQC、MAQC2、MAQC3、PICKRELL 和两个具有两组和三组条件的模拟数据集上进行了验证,通过改变差异表达的百分比和重复数来进行。本方法的性能与各种最先进的方法进行了比较,在接收者操作特征曲线和差异表达方面表现更好。