Suppr超能文献

使用自适应修剪均值和多参考对 RNA-Seq 数据进行标准化。

Normalization of RNA-Seq data using adaptive trimmed mean with multi-reference.

机构信息

School of Life Sciences, Gwangju Institute of Science and Technology, 123 Cheomdan-gwagiro, 61005, Gwangju, South Korea.

出版信息

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae241.

Abstract

The normalization of RNA sequencing data is a primary step for downstream analysis. The most popular method used for the normalization is the trimmed mean of M values (TMM) and DESeq. The TMM tries to trim away extreme log fold changes of the data to normalize the raw read counts based on the remaining non-deferentially expressed genes. However, the major problem with the TMM is that the values of trimming factor M are heuristic. This paper tries to estimate the adaptive value of M in TMM based on Jaeckel's Estimator, and each sample acts as a reference to find the scale factor of each sample. The presented approach is validated on SEQC, MAQC2, MAQC3, PICKRELL and two simulated datasets with two-group and three-group conditions by varying the percentage of differential expression and the number of replicates. The performance of the present approach is compared with various state-of-the-art methods, and it is better in terms of area under the receiver operating characteristic curve and differential expression.

摘要

RNA 测序数据的标准化是下游分析的首要步骤。最常用的标准化方法是 trimmed mean of M values (TMM) 和 DESeq。TMM 试图通过剔除数据的极端对数倍数变化来标准化原始读取计数,基于剩余的非差异表达基因。然而,TMM 的主要问题是修剪因子 M 的值是启发式的。本文试图根据 Jaeckel 的估计器来估计 TMM 中的 M 的适应性值,每个样本作为一个参考来找到每个样本的比例因子。该方法在 SEQC、MAQC2、MAQC3、PICKRELL 和两个具有两组和三组条件的模拟数据集上进行了验证,通过改变差异表达的百分比和重复数来进行。本方法的性能与各种最先进的方法进行了比较,在接收者操作特征曲线和差异表达方面表现更好。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95c6/11107385/a7e102d01503/bbae241f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验