Department of Statistical Science, Southern Methodist University, Dallas, TX 75275-0332, USA.
Department of Population and Data Sciences, Quantitative Biomedical Research Center, The University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.
Bioinformatics. 2020 Jun 1;36(11):3401-3408. doi: 10.1093/bioinformatics/btaa153.
Recent studies have shown that RNA-sequencing (RNA-seq) can be used to measure mRNA of sufficient quality extracted from formalin-fixed paraffin-embedded (FFPE) tissues to provide whole-genome transcriptome analysis. However, little attention has been given to the normalization of FFPE RNA-seq data, a key step that adjusts for unwanted biological and technical effects that can bias the signal of interest. Existing methods, developed based on fresh-frozen or similar-type samples, may cause suboptimal performance.
We proposed a new normalization method, labeled MIXnorm, for FFPE RNA-seq data. MIXnorm relies on a two-component mixture model, which models non-expressed genes by zero-inflated Poisson distributions and models expressed genes by truncated normal distributions. To obtain maximum likelihood estimates, we developed a nested EM algorithm, in which closed-form updates are available in each iteration. By eliminating the need for numerical optimization in the M-step, the algorithm is easy to implement and computationally efficient. We evaluated MIXnorm through simulations and cancer studies. MIXnorm makes a significant improvement over commonly used methods for RNA-seq expression data.
R code available at https://github.com/S-YIN/MIXnorm.
Supplementary data are available at Bioinformatics online.
最近的研究表明,RNA 测序(RNA-seq)可用于测量从福尔马林固定石蜡包埋(FFPE)组织中提取的具有足够质量的 mRNA,从而提供全基因组转录组分析。然而,对于 FFPE RNA-seq 数据的归一化问题,即调整可能会影响感兴趣信号的非生物和技术效应的关键步骤,尚未引起太多关注。基于新鲜冷冻或类似类型样本开发的现有方法可能会导致性能不佳。
我们提出了一种新的 FFPE RNA-seq 数据归一化方法,称为 MIXnorm。MIXnorm 依赖于双成分混合模型,该模型通过零膨胀泊松分布对非表达基因进行建模,并通过截断正态分布对表达基因进行建模。为了获得最大似然估计,我们开发了一种嵌套 EM 算法,其中每个迭代都有封闭形式的更新。通过在 M 步骤中消除对数值优化的需求,该算法易于实现且计算效率高。我们通过模拟和癌症研究评估了 MIXnorm。与 RNA-seq 表达数据的常用方法相比,MIXnorm 有显著的改进。
R 代码可在 https://github.com/S-YIN/MIXnorm 上获得。
补充资料可在生物信息学在线获得。