一种用于异质组织中 RNA-seq 表达解卷积的混合模型。

A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues.

机构信息

Department of Computer Science, University of California-Irvine, CA, USA.

出版信息

BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S11. doi: 10.1186/1471-2105-14-S5-S11. Epub 2013 Apr 10.

DOI:10.1186/1471-2105-14-S5-S11

PMID:23735186

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3622628/

Abstract

BACKGROUND

RNA-seq, a next-generation sequencing based method for transcriptome analysis, is rapidly emerging as the method of choice for comprehensive transcript abundance estimation. The accuracy of RNA-seq can be highly impacted by the purity of samples. A prominent, outstanding problem in RNA-seq is how to estimate transcript abundances in heterogeneous tissues, where a sample is composed of more than one cell type and the inhomogeneity can substantially confound the transcript abundance estimation of each individual cell type. Although experimental methods have been proposed to dissect multiple distinct cell types, computationally "deconvoluting" heterogeneous tissues provides an attractive alternative, since it keeps the tissue sample as well as the subsequent molecular content yield intact.

RESULTS

Here we propose a probabilistic model-based approach, Transcript Estimation from Mixed Tissue samples (TEMT), to estimate the transcript abundances of each cell type of interest from RNA-seq data of heterogeneous tissue samples. TEMT incorporates positional and sequence-specific biases, and its online EM algorithm only requires a runtime proportional to the data size and a small constant memory. We test the proposed method on both simulation data and recently released ENCODE data, and show that TEMT significantly outperforms current state-of-the-art methods that do not take tissue heterogeneity into account. Currently, TEMT only resolves the tissue heterogeneity resulting from two cell types, but it can be extended to handle tissue heterogeneity resulting from multi cell types. TEMT is written in python, and is freely available at https://github.com/uci-cbcl/TEMT.

CONCLUSIONS

The probabilistic model-based approach proposed here provides a new method for analyzing RNA-seq data from heterogeneous tissue samples. By applying the method to both simulation data and ENCODE data, we show that explicitly accounting for tissue heterogeneity can significantly improve the accuracy of transcript abundance estimation.

摘要

背景

RNA-seq 是一种基于下一代测序的转录组分析方法，正在迅速成为全面转录丰度估计的首选方法。RNA-seq 的准确性可能会受到样品纯度的高度影响。RNA-seq 的一个突出问题是如何估计异质组织中的转录本丰度，其中一个样品由超过一种细胞类型组成，不均匀性会严重干扰每个单个细胞类型的转录本丰度估计。尽管已经提出了实验方法来分离多个不同的细胞类型，但通过计算“解卷积”异质组织提供了一种有吸引力的替代方法，因为它保持了组织样品以及随后的分子含量的完整性。

结果

在这里，我们提出了一种基于概率模型的方法，即混合组织样本的转录本估计（TEMT），从异质组织样本的 RNA-seq 数据中估计每个感兴趣的细胞类型的转录本丰度。TEMT 结合了位置和序列特异性偏差，其在线 EM 算法仅需要与数据大小成比例的运行时间和少量的常数内存。我们在模拟数据和最近发布的 ENCODE 数据上测试了所提出的方法，并表明 TEMT 显著优于当前不考虑组织异质性的最先进方法。目前，TEMT 仅解决由两种细胞类型引起的组织异质性，但它可以扩展到处理由多种细胞类型引起的组织异质性。TEMT 是用 python 编写的，可在 https://github.com/uci-cbcl/TEMT 上免费获得。

结论

这里提出的基于概率模型的方法为分析异质组织样本的 RNA-seq 数据提供了一种新方法。通过将该方法应用于模拟数据和 ENCODE 数据，我们表明明确考虑组织异质性可以显著提高转录本丰度估计的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b5b/3622628/f7a361823215/1471-2105-14-S5-S11-1.jpg

相似文献

A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues.一种用于异质组织中 RNA-seq 表达解卷积的混合模型。

BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S11. doi: 10.1186/1471-2105-14-S5-S11. Epub 2013 Apr 10.

DeconRNASeq: a statistical framework for deconvolution of heterogeneous tissue samples based on mRNA-Seq data.DeconRNASeq：基于 mRNA-Seq 数据对异质组织样本进行去卷积的统计框架。

Bioinformatics. 2013 Apr 15;29(8):1083-5. doi: 10.1093/bioinformatics/btt090. Epub 2013 Feb 21.

TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference.TIGAR：一种通过变分贝叶斯推断进行 RNA-Seq 数据缺口对齐的转录本丰度估计方法。

Bioinformatics. 2013 Sep 15;29(18):2292-9. doi: 10.1093/bioinformatics/btt381. Epub 2013 Jul 2.

Identifying differentially expressed transcripts from RNA-seq data with biological variation.从具有生物学变异的 RNA-seq 数据中鉴定差异表达的转录本。

Bioinformatics. 2012 Jul 1;28(13):1721-8. doi: 10.1093/bioinformatics/bts260. Epub 2012 May 3.

Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads.从偏向性 RNA-Seq 读段进行转录组组装和异构体表达水平估计。

Bioinformatics. 2012 Nov 15;28(22):2914-21. doi: 10.1093/bioinformatics/bts559. Epub 2012 Oct 11.

Deconvolution from bulk gene expression by leveraging sample-wise and gene-wise similarities and single-cell RNA-Seq data.通过利用样本和基因之间的相似性以及单细胞 RNA-Seq 数据进行批量基因表达的反卷积。

BMC Genomics. 2024 Sep 18;25(1):875. doi: 10.1186/s12864-024-10728-x.

DAFS: a data-adaptive flag method for RNA-sequencing data to differentiate genes with low and high expression.DAFS：一种用于 RNA-seq 数据的自适应标记方法，用于区分低表达和高表达基因。

BMC Bioinformatics. 2014 Mar 31;15:92. doi: 10.1186/1471-2105-15-92.

A fast and globally optimal solution for RNA-seq quantification.一种用于 RNA-seq 定量的快速且全局最优的解决方案。

Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad298.

Using RNentropy to Detect Significant Variation in Gene Expression Across Multiple RNA-Seq or Single-Cell RNA-Seq Samples.使用 RNentropy 检测多个 RNA-Seq 或单细胞 RNA-Seq 样本中基因表达的显著变化。

Methods Mol Biol. 2021;2284:77-96. doi: 10.1007/978-1-0716-1307-8_6.

RNA-Skim: a rapid method for RNA-Seq quantification at transcript level.RNA-Skim：一种在转录水平上进行 RNA-Seq 定量的快速方法。

Bioinformatics. 2014 Jun 15;30(12):i283-i292. doi: 10.1093/bioinformatics/btu288.

引用本文的文献

Tumor microenvironment: barrier or opportunity towards effective cancer therapy.肿瘤微环境：有效癌症治疗的障碍还是机会？

J Biomed Sci. 2022 Oct 17;29(1):83. doi: 10.1186/s12929-022-00866-3.

Estimating cell type-specific differential expression using deconvolution.使用反卷积估计细胞类型特异性差异表达。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab433.

The role of single-cell sequencing in studying tumour evolution.单细胞测序在肿瘤进化研究中的作用。

Fac Rev. 2021 May 26;10:49. doi: 10.12703/r/10-49. eCollection 2021.

CDSeqR: fast complete deconvolution for gene expression data from bulk tissues.CDSeqR：从批量组织中进行基因表达数据的快速完整去卷积。

BMC Bioinformatics. 2021 May 24;22(1):262. doi: 10.1186/s12859-021-04186-5.

Computational deconvolution to estimate cell type-specific gene expression from bulk data.利用计算反卷积从批量数据中估计细胞类型特异性基因表达。

NAR Genom Bioinform. 2021 Jan 12;3(1):lqaa110. doi: 10.1093/nargab/lqaa110. eCollection 2021 Mar.

CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data.CDSeq：一种使用基因表达数据对异质样本进行全面剖析的全新去卷积方法。

PLoS Comput Biol. 2019 Dec 2;15(12):e1007510. doi: 10.1371/journal.pcbi.1007510. eCollection 2019 Dec.

Insights from deconvolution of cell subtype proportions enhance the interpretation of functional genomic data.细胞亚型比例去卷积的见解增强了功能基因组数据的解释。

PLoS One. 2019 Apr 25;14(4):e0215987. doi: 10.1371/journal.pone.0215987. eCollection 2019.

Histoepigenetic analysis of HPV- and tobacco-associated head and neck cancer identifies both subtype-specific and common therapeutic targets despite divergent microenvironments.HPV 和烟草相关头颈癌的组织表观遗传学分析鉴定了特定亚型和共同的治疗靶点，尽管微环境不同。

Oncogene. 2019 May;38(19):3551-3568. doi: 10.1038/s41388-018-0659-4. Epub 2019 Jan 17.

Review of applications of high-throughput sequencing in personalized medicine: barriers and facilitators of future progress in research and clinical application.高通量测序在个性化医学中的应用综述：研究和临床应用未来进展的障碍和促进因素。

Brief Bioinform. 2019 Sep 27;20(5):1795-1811. doi: 10.1093/bib/bby051.

Integrating -Omics Approaches into Human Population-Based Studies of Prenatal and Early-Life Exposures.将组学方法整合到基于人群的产前和生命早期暴露的人类研究中。

Curr Environ Health Rep. 2018 Sep;5(3):328-337. doi: 10.1007/s40572-018-0204-1.

本文引用的文献

Streaming fragment assignment for real-time analysis of sequencing experiments.实时分析测序实验的流片段分配。

Nat Methods. 2013 Jan;10(1):71-3. doi: 10.1038/nmeth.2251. Epub 2012 Nov 18.

Cancer genome scanning in plasma: detection of tumor-associated copy number aberrations, single-nucleotide variants, and tumoral heterogeneity by massively parallel sequencing.血浆中的癌症基因组扫描：通过大规模平行测序检测肿瘤相关拷贝数异常、单核苷酸变异和肿瘤异质性。

Clin Chem. 2013 Jan;59(1):211-24. doi: 10.1373/clinchem.2012.196014. Epub 2012 Oct 11.

Comprehensive molecular portraits of human breast tumours.人类乳腺肿瘤的全面分子特征图谱。

Nature. 2012 Oct 4;490(7418):61-70. doi: 10.1038/nature11412. Epub 2012 Sep 23.

An integrated encyclopedia of DNA elements in the human genome.人类基因组中 DNA 元件的综合百科全书。

Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247.

RNA-seq analysis of prostate cancer in the Chinese population identifies recurrent gene fusions, cancer-associated long noncoding RNAs and aberrant alternative splicings.中文人群前列腺癌的 RNA-seq 分析鉴定出了反复出现的基因融合、癌症相关的长非编码 RNA 和异常的可变剪接。

Cell Res. 2012 May;22(5):806-21. doi: 10.1038/cr.2012.30. Epub 2012 Feb 21.

Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next-generation sequence data.校正肿瘤基因组大小和肿瘤细胞含量可以更好地从下一代测序数据中估计拷贝数改变。

Bioinformatics. 2012 Jan 1;28(1):40-7. doi: 10.1093/bioinformatics/btr593. Epub 2011 Oct 28.

BACOM: in silico detection of genomic deletion types and correction of normal cell contamination in copy number data.BACOM：在基因组缺失类型的计算检测中，对拷贝数数据中的正常细胞污染进行校正。

Bioinformatics. 2011 Jun 1;27(11):1473-80. doi: 10.1093/bioinformatics/btr183. Epub 2011 Apr 15.

Improving RNA-Seq expression estimates by correcting for fragment bias.通过纠正片段偏倚来提高 RNA-Seq 表达估计。

Genome Biol. 2011;12(3):R22. doi: 10.1186/gb-2011-12-3-r22. Epub 2011 Mar 16.

Tumour evolution inferred by single-cell sequencing.单细胞测序推断肿瘤进化。

Nature. 2011 Apr 7;472(7341):90-4. doi: 10.1038/nature09807. Epub 2011 Mar 13.

Advances in understanding cancer genomes through second-generation sequencing.通过第二代测序技术深入了解癌症基因组。

Nat Rev Genet. 2010 Oct;11(10):685-96. doi: 10.1038/nrg2841.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于异质组织中 RNA-seq 表达解卷积的混合模型。

A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献