MIXnorm：从福尔马林固定石蜡包埋样本中归一化 RNA-seq 数据。

MIXnorm: normalizing RNA-seq data from formalin-fixed paraffin-embedded samples.

机构信息

Department of Statistical Science, Southern Methodist University, Dallas, TX 75275-0332, USA.

Department of Population and Data Sciences, Quantitative Biomedical Research Center, The University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.

出版信息

Bioinformatics. 2020 Jun 1;36(11):3401-3408. doi: 10.1093/bioinformatics/btaa153.

DOI:10.1093/bioinformatics/btaa153

PMID:32134470

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7267832/

Abstract

MOTIVATION

Recent studies have shown that RNA-sequencing (RNA-seq) can be used to measure mRNA of sufficient quality extracted from formalin-fixed paraffin-embedded (FFPE) tissues to provide whole-genome transcriptome analysis. However, little attention has been given to the normalization of FFPE RNA-seq data, a key step that adjusts for unwanted biological and technical effects that can bias the signal of interest. Existing methods, developed based on fresh-frozen or similar-type samples, may cause suboptimal performance.

RESULTS

We proposed a new normalization method, labeled MIXnorm, for FFPE RNA-seq data. MIXnorm relies on a two-component mixture model, which models non-expressed genes by zero-inflated Poisson distributions and models expressed genes by truncated normal distributions. To obtain maximum likelihood estimates, we developed a nested EM algorithm, in which closed-form updates are available in each iteration. By eliminating the need for numerical optimization in the M-step, the algorithm is easy to implement and computationally efficient. We evaluated MIXnorm through simulations and cancer studies. MIXnorm makes a significant improvement over commonly used methods for RNA-seq expression data.

AVAILABILITY AND IMPLEMENTATION

R code available at https://github.com/S-YIN/MIXnorm.

CONTACT

swang@smu.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

最近的研究表明，RNA 测序（RNA-seq）可用于测量从福尔马林固定石蜡包埋（FFPE）组织中提取的具有足够质量的 mRNA，从而提供全基因组转录组分析。然而，对于 FFPE RNA-seq 数据的归一化问题，即调整可能会影响感兴趣信号的非生物和技术效应的关键步骤，尚未引起太多关注。基于新鲜冷冻或类似类型样本开发的现有方法可能会导致性能不佳。

结果

我们提出了一种新的 FFPE RNA-seq 数据归一化方法，称为 MIXnorm。MIXnorm 依赖于双成分混合模型，该模型通过零膨胀泊松分布对非表达基因进行建模，并通过截断正态分布对表达基因进行建模。为了获得最大似然估计，我们开发了一种嵌套 EM 算法，其中每个迭代都有封闭形式的更新。通过在 M 步骤中消除对数值优化的需求，该算法易于实现且计算效率高。我们通过模拟和癌症研究评估了 MIXnorm。与 RNA-seq 表达数据的常用方法相比，MIXnorm 有显著的改进。

可用性和实现

R 代码可在 https://github.com/S-YIN/MIXnorm 上获得。

联系人

swang@smu.edu。

补充信息

补充资料可在生物信息学在线获得。

相似文献

MIXnorm: normalizing RNA-seq data from formalin-fixed paraffin-embedded samples.MIXnorm：从福尔马林固定石蜡包埋样本中归一化 RNA-seq 数据。

Bioinformatics. 2020 Jun 1;36(11):3401-3408. doi: 10.1093/bioinformatics/btaa153.

SMIXnorm: Fast and Accurate RNA-Seq Data Normalization for Formalin-Fixed Paraffin-Embedded Samples.SMIXnorm：用于福尔马林固定石蜡包埋样本的快速准确RNA测序数据标准化方法

Front Genet. 2021 Mar 24;12:650795. doi: 10.3389/fgene.2021.650795. eCollection 2021.

Mining the Archives: A Cross-Platform Analysis of Gene Expression Profiles in Archival Formalin-Fixed Paraffin-Embedded Tissues.挖掘档案：对存档福尔马林固定石蜡包埋组织中基因表达谱的跨平台分析

Toxicol Sci. 2015 Dec;148(2):460-72. doi: 10.1093/toxsci/kfv195. Epub 2015 Sep 10.

Application of the 3' mRNA-Seq using unique molecular identifiers in highly degraded RNA derived from formalin-fixed, paraffin-embedded tissue.使用独特分子标识符的3' mRNA测序在源自福尔马林固定石蜡包埋组织的高度降解RNA中的应用

BMC Genomics. 2021 Oct 24;22(1):759. doi: 10.1186/s12864-021-08068-1.

Whole-Transcriptome profiling of formalin-fixed, paraffin-embedded renal cell carcinoma by RNA-seq.通过RNA测序对福尔马林固定、石蜡包埋的肾细胞癌进行全转录组分析。

BMC Genomics. 2014 Dec 11;15(1):1087. doi: 10.1186/1471-2164-15-1087.

Systematic evaluation of RNA quality, microarray data reliability and pathway analysis in fresh, fresh frozen and formalin-fixed paraffin-embedded tissue samples.系统评估新鲜、新鲜冷冻和福尔马林固定石蜡包埋组织样本中的 RNA 质量、微阵列数据可靠性和通路分析。

Sci Rep. 2018 Apr 20;8(1):6351. doi: 10.1038/s41598-018-24781-6.

Editor's Highlight: Dose-Response Analysis of RNA-Seq Profiles in Archival Formalin-Fixed Paraffin-Embedded Samples.编辑推荐：存档福尔马林固定石蜡包埋样本中RNA测序图谱的剂量反应分析

Toxicol Sci. 2016 Dec;154(2):202-213. doi: 10.1093/toxsci/kfw161. Epub 2016 Aug 25.

Preparation of archival formalin-fixed paraffin-embedded mouse liver samples for use with the Agilent gene expression microarray platform.制备用于安捷伦基因表达微阵列平台的存档福尔马林固定石蜡包埋小鼠肝脏样本。

J Pharmacol Toxicol Methods. 2013 Sep-Oct;68(2):260-268. doi: 10.1016/j.vascn.2013.02.008. Epub 2013 Mar 1.

Validation of the Complexity INdex in SARComas prognostic signature on formalin-fixed, paraffin-embedded, soft-tissue sarcomas.验证复杂性指数在软组织肉瘤福尔马林固定石蜡包埋样本中的预后签名中的作用。

Ann Oncol. 2018 Aug 1;29(8):1828-1835. doi: 10.1093/annonc/mdy194.

3' MACE RNA-sequencing allows for transcriptome profiling in human tissue samples after long-term storage.3' MACE RNA 测序可实现长期储存后的人类组织样本的转录组图谱分析。

Lab Invest. 2020 Oct;100(10):1345-1355. doi: 10.1038/s41374-020-0446-z. Epub 2020 May 28.

引用本文的文献

Optimization of FFPE preparation and identification of gene attributes associated with RNA degradation.福尔马林固定石蜡包埋（FFPE）样本制备的优化以及与RNA降解相关的基因属性鉴定。

NAR Genom Bioinform. 2024 Jan 31;6(1):lqae008. doi: 10.1093/nargab/lqae008. eCollection 2024 Mar.

Integrative genomic and transcriptomic analysis in plasmablastic lymphoma identifies disruption of key regulatory pathways.浆母细胞性淋巴瘤的综合基因组和转录组分析确定了关键调控途径的破坏。

Blood Adv. 2022 Jan 25;6(2):637-651. doi: 10.1182/bloodadvances.2021005486.

SMIXnorm: Fast and Accurate RNA-Seq Data Normalization for Formalin-Fixed Paraffin-Embedded Samples.SMIXnorm：用于福尔马林固定石蜡包埋样本的快速准确RNA测序数据标准化方法

Front Genet. 2021 Mar 24;12:650795. doi: 10.3389/fgene.2021.650795. eCollection 2021.

本文引用的文献

RNA-seq transcriptome analysis of formalin fixed, paraffin-embedded canine meningioma.福尔马林固定、石蜡包埋犬脑膜瘤的RNA测序转录组分析

PLoS One. 2017 Oct 26;12(10):e0187150. doi: 10.1371/journal.pone.0187150. eCollection 2017.

Normalizing single-cell RNA sequencing data: challenges and opportunities.单细胞RNA测序数据的标准化：挑战与机遇

Nat Methods. 2017 Jun;14(6):565-571. doi: 10.1038/nmeth.4292. Epub 2017 May 15.

Overexpression of Functional SLC6A3 in Clear Cell Renal Cell Carcinoma.功能性 SLC6A3 在透明细胞肾细胞癌中的过表达。

Clin Cancer Res. 2017 Apr 15;23(8):2105-2115. doi: 10.1158/1078-0432.CCR-16-0496. Epub 2016 Sep 23.

A Comparison of Fresh Frozen vs. Formalin-Fixed, Paraffin-Embedded Specimens of Canine Mammary Tumors via Branched-DNA Assay.通过分支DNA分析比较犬乳腺肿瘤的新鲜冷冻标本与福尔马林固定、石蜡包埋标本

Int J Mol Sci. 2016 May 13;17(5):724. doi: 10.3390/ijms17050724.

RNA sequencing validation of the Complexity INdex in SARComas prognostic signature.肉瘤预后特征中复杂性指数的RNA测序验证

Eur J Cancer. 2016 Apr;57:104-11. doi: 10.1016/j.ejca.2015.12.027. Epub 2016 Feb 23.

Transcriptome Sequencing (RNAseq) Enables Utilization of Formalin-Fixed, Paraffin-Embedded Biopsies with Clear Cell Renal Cell Carcinoma for Exploration of Disease Biology and Biomarker Development.转录组测序（RNAseq）能够利用福尔马林固定、石蜡包埋的透明细胞肾细胞癌活检组织来探索疾病生物学并开发生物标志物。

PLoS One. 2016 Feb 22;11(2):e0149743. doi: 10.1371/journal.pone.0149743. eCollection 2016.

Identification of the dopamine transporter SLC6A3 as a biomarker for patients with renal cell carcinoma.鉴定多巴胺转运体SLC6A3作为肾细胞癌患者的生物标志物。

Mol Cancer. 2016 Feb 2;15:10. doi: 10.1186/s12943-016-0495-5.

Robust gene expression and mutation analyses of RNA-sequencing of formalin-fixed diagnostic tumor samples.福尔马林固定的诊断性肿瘤样本的RNA测序的稳健基因表达和突变分析。

Sci Rep. 2015 Jul 23;5:12335. doi: 10.1038/srep12335.

Identification of mRNAs and lincRNAs associated with lung cancer progression using next-generation RNA sequencing from laser micro-dissected archival FFPE tissue specimens.利用激光显微切割存档FFPE组织标本的新一代RNA测序技术鉴定与肺癌进展相关的mRNA和lincRNA

Lung Cancer. 2014 Jul;85(1):31-39. doi: 10.1016/j.lungcan.2014.03.020. Epub 2014 Mar 29.

Differentiating progressive from nonprogressive T1 bladder cancer by gene expression profiling: applying RNA-sequencing analysis on archived specimens.通过基因表达谱区分进展性与非进展性T1期膀胱癌：对存档标本进行RNA测序分析

Urol Oncol. 2014 Apr;32(3):327-36. doi: 10.1016/j.urolonc.2013.06.014. Epub 2013 Sep 18.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验