Tran Diem-Trang, Might Matthew
School of Computing, University of Utah, Salt Lake City, UT, United States of America.
Hugh Kaul Precision Medicine Institute, University of Alabama at Birmingham, Birmingham, AL, United States of America.
PeerJ. 2021 Oct 4;9:e12233. doi: 10.7717/peerj.12233. eCollection 2021.
Normalization of RNA-seq data has been an active area of research since the problem was first recognized a decade ago. Despite the active development of new normalizers, their performance measures have been given little attention. To evaluate normalizers, researchers have been relying on measures, most of which are either qualitative, potentially biased, or easily confounded by parametric choices of downstream analysis. We propose a metric called condition-number based deviation, or to quantify normalization success. measures how much an expression matrix differs from another. If a ground truth normalization is given, can then be used to evaluate the performance of normalizers. To establish experimental ground truth, we compiled an extensive set of public RNA-seq assays with external spike-ins. This data collection, together with provides a valuable toolset for benchmarking new and existing normalization methods.
自十年前首次认识到RNA测序数据标准化问题以来,它一直是一个活跃的研究领域。尽管新的标准化方法在不断积极开发,但对其性能评估却很少受到关注。为了评估标准化方法,研究人员一直依赖于一些指标,其中大多数要么是定性的、可能存在偏差,要么很容易因下游分析的参数选择而混淆。我们提出了一种称为基于条件数的偏差(或CNBD)的指标来量化标准化的成功程度。CNBD衡量一个表达矩阵与另一个表达矩阵的差异程度。如果给出了真实的标准化结果,那么CNBD就可以用来评估标准化方法的性能。为了建立实验性的真实标准,我们收集了大量带有外部掺入物的公共RNA测序分析数据。这个数据集,连同CNBD,为评估新的和现有的标准化方法提供了一个有价值的工具集。