基于转录组学的去卷积方法的系统评价及使用数千个临床样本的参考资料。

Systematic evaluation of transcriptomics-based deconvolution methods and references using thousands of clinical samples.

机构信息

Department of Molecular Cellular and Developmental Biology, University of California Los Angeles, Los Angeles, CA, USA.

Bioinformatics Interdepartmental Degree Program, University of California Los Angeles, Los Angeles, CA, USA.

出版信息

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab265.

DOI:10.1093/bib/bbab265

PMID:34346485

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8768458/

Abstract

Estimating cell type composition of blood and tissue samples is a biological challenge relevant in both laboratory studies and clinical care. In recent years, a number of computational tools have been developed to estimate cell type abundance using gene expression data. Although these tools use a variety of approaches, they all leverage expression profiles from purified cell types to evaluate the cell type composition within samples. In this study, we compare 12 cell type quantification tools and evaluate their performance while using each of 10 separate reference profiles. Specifically, we have run each tool on over 4000 samples with known cell type proportions, spanning both immune and stromal cell types. A total of 12 of these represent in vitro synthetic mixtures and 300 represent in silico synthetic mixtures prepared using single-cell data. A final 3728 clinical samples have been collected from the Framingham cohort, for which cell populations have been quantified using electrical impedance cell counting. When tools are applied to the Framingham dataset, the tool Estimating the Proportions of Immune and Cancer cells (EPIC) produces the highest correlation, whereas Gene Expression Deconvolution Interactive Tool (GEDIT) produces the lowest error. The best tool for other datasets is varied, but CIBERSORT and GEDIT most consistently produce accurate results. We find that optimal reference depends on the tool used, and report suggested references to be used with each tool. Most tools return results within minutes, but on large datasets runtimes for CIBERSORT can exceed hours or even days. We conclude that deconvolution methods are capable of returning high-quality results, but that proper reference selection is critical.

摘要

估计血液和组织样本的细胞类型组成是实验室研究和临床护理中都具有挑战性的生物学问题。近年来，已经开发了许多计算工具，可使用基因表达数据估计细胞类型丰度。尽管这些工具使用了各种方法，但它们都利用来自纯化细胞类型的表达谱来评估样品中的细胞类型组成。在这项研究中，我们比较了 12 种细胞类型定量工具，并在使用 10 种单独参考谱中的每一种时评估了它们的性能。具体来说，我们已经在超过 4000 个具有已知细胞类型比例的样本上运行了每个工具，这些样本涵盖了免疫细胞和基质细胞类型。其中共有 12 个代表体外合成混合物，300 个代表使用单细胞数据制备的模拟混合物。最后从 Framingham 队列中收集了 3728 个临床样本，其中细胞群体使用电阻抗细胞计数进行了定量。当将工具应用于 Framingham 数据集时，Estimating the Proportions of Immune and Cancer cells (EPIC) 工具产生的相关性最高，而 Gene Expression Deconvolution Interactive Tool (GEDIT) 工具产生的误差最低。对于其他数据集，最佳工具各不相同，但 CIBERSORT 和 GEDIT 最能始终如一地产生准确的结果。我们发现最佳参考取决于所使用的工具，并报告了每个工具的建议参考。大多数工具在几分钟内返回结果，但对于大型数据集，CIBERSORT 的运行时间可能超过几个小时甚至几天。我们的结论是，去卷积方法能够返回高质量的结果，但正确的参考选择至关重要。

相似文献

Systematic evaluation of transcriptomics-based deconvolution methods and references using thousands of clinical samples.基于转录组学的去卷积方法的系统评价及使用数千个临床样本的参考资料。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab265.

Improved cell composition deconvolution method of bulk gene expression profiles to quantify subsets of immune cells.改进的批量基因表达谱的细胞组成去卷积方法，以量化免疫细胞亚群。

BMC Med Genomics. 2019 Dec 20;12(Suppl 8):169. doi: 10.1186/s12920-019-0613-5.

The Gene Expression Deconvolution Interactive Tool (GEDIT): accurate cell type quantification from gene expression data.基因表达去卷积交互工具（GEDIT）：从基因表达数据中准确量化细胞类型。

Gigascience. 2021 Feb 16;10(2). doi: 10.1093/gigascience/giab002.

Systematic evaluation and validation of reference and library selection methods for deconvolution of cord blood DNA methylation data.系统评估和验证用于解析脐带血 DNA 甲基化数据的参考和文库选择方法。

Clin Epigenetics. 2019 Aug 27;11(1):125. doi: 10.1186/s13148-019-0717-y.

Benchmarking of cell type deconvolution pipelines for transcriptomics data.基于转录组数据的细胞类型去卷积分析流水线的基准测试

Nat Commun. 2020 Nov 6;11(1):5650. doi: 10.1038/s41467-020-19015-1.

CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data.CDSeq：一种使用基因表达数据对异质样本进行全面剖析的全新去卷积方法。

PLoS Comput Biol. 2019 Dec 2;15(12):e1007510. doi: 10.1371/journal.pcbi.1007510. eCollection 2019 Dec.

Spotless, a reproducible pipeline for benchmarking cell type deconvolution in spatial transcriptomics.无瑕疵：用于空间转录组学中细胞类型去卷积基准测试的可重现管道。

Elife. 2024 May 24;12:RP88431. doi: 10.7554/eLife.88431.

imply: improving cell-type deconvolution accuracy using personalized reference profiles.提示：使用个性化参考图谱提高细胞类型去卷积准确性。

Genome Med. 2024 Apr 29;16(1):65. doi: 10.1186/s13073-024-01338-z.

Deconvolution of heterogeneous tumor samples using partial reference signals.使用部分参考信号对异质肿瘤样本进行反卷积。

PLoS Comput Biol. 2020 Nov 30;16(11):e1008452. doi: 10.1371/journal.pcbi.1008452. eCollection 2020 Nov.

EPIC: A Tool to Estimate the Proportions of Different Cell Types from Bulk Gene Expression Data.EPIC：一种从批量基因表达数据估计不同细胞类型比例的工具。

Methods Mol Biol. 2020;2120:233-248. doi: 10.1007/978-1-0716-0327-7_17.

引用本文的文献

The genetic history of Portugal over the past 5,000 years.葡萄牙过去5000年的基因历史。

Genome Biol. 2025 Aug 18;26(1):248. doi: 10.1186/s13059-025-03707-2.

An improved reference library and method for accurate cell-type deconvolution of bulk-tissue miRNA data.一种用于批量组织miRNA数据精确细胞类型反卷积的改进参考文库和方法。

Nat Commun. 2025 Jul 1;16(1):5508. doi: 10.1038/s41467-025-60521-x.

Molecular group and correlation guided structural learning for multi-phenotype prediction.基于分子群组和相关性的多表型预测结构学习。

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae585.

Evaluating performance and applications of sample-wise cell deconvolution methods on human brain transcriptomic data.评估样本水平细胞去卷积方法在人类大脑转录组数据上的性能和应用。

Sci Adv. 2024 May 24;10(21):eadh2588. doi: 10.1126/sciadv.adh2588. Epub 2024 May 23.

CATD: a reproducible pipeline for selecting cell-type deconvolution methods across tissues.CATD：一种用于跨组织选择细胞类型反卷积方法的可重复流程。

Bioinform Adv. 2024 Mar 23;4(1):vbae048. doi: 10.1093/bioadv/vbae048. eCollection 2024.

Fourteen years of cellular deconvolution: methodology, applications, technical evaluation and outstanding challenges.十四年的细胞去卷积：方法学、应用、技术评估和突出挑战。

Nucleic Acids Res. 2024 May 22;52(9):4761-4783. doi: 10.1093/nar/gkae267.

Challenges and perspectives in computational deconvolution of genomics data.计算基因组学数据去卷积的挑战与展望。

Nat Methods. 2024 Mar;21(3):391-400. doi: 10.1038/s41592-023-02166-6. Epub 2024 Feb 19.

Frequencies of 4 tumor-infiltrating lymphocytes potently predict survival in glioblastoma, an immune desert.在免疫荒漠胶质母细胞瘤中，4 种肿瘤浸润淋巴细胞的频率能有力地预测生存。

Neuro Oncol. 2024 Mar 4;26(3):473-487. doi: 10.1093/neuonc/noad204.

Proteome deconvolution of liver biopsies reveals hepatic cell composition as an important marker of fibrosis.肝脏活检的蛋白质组反卷积揭示肝细胞组成是纤维化的重要标志物。

Comput Struct Biotechnol J. 2023 Sep 4;21:4361-4369. doi: 10.1016/j.csbj.2023.08.037. eCollection 2023.

Transcriptomic profiling of peripheral blood cells in HPV-associated carcinoma patients receiving combined valproic acid and avelumab.HPV 相关癌患者接受丙戊酸联合avelumab 治疗后外周血细胞的转录组分析。

Mol Oncol. 2024 May;18(5):1209-1230. doi: 10.1002/1878-0261.13519. Epub 2023 Sep 17.

本文引用的文献

Gigascience. 2021 Feb 16;10(2). doi: 10.1093/gigascience/giab002.

Benchmarking of cell type deconvolution pipelines for transcriptomics data.基于转录组数据的细胞类型去卷积分析流水线的基准测试

Nat Commun. 2020 Nov 6;11(1):5650. doi: 10.1038/s41467-020-19015-1.

Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software.基于无参 DNA 甲基化去卷积软件比较分析的细胞类型异质性定量指南。

BMC Bioinformatics. 2020 Jan 13;21(1):16. doi: 10.1186/s12859-019-3307-2.

Comprehensive Benchmarking and Integration of Tumor Microenvironment Cell Estimation Methods.全面基准测试和肿瘤微环境细胞估计方法的整合。

Cancer Res. 2019 Dec 15;79(24):6238-6246. doi: 10.1158/0008-5472.CAN-18-3560. Epub 2019 Oct 22.

Comprehensive evaluation of transcriptome-based cell-type quantification methods for immuno-oncology.基于转录组的免疫肿瘤学细胞类型定量方法的综合评估。

Bioinformatics. 2019 Jul 15;35(14):i436-i445. doi: 10.1093/bioinformatics/btz363.

Molecular and pharmacological modulators of the tumor immune contexture revealed by deconvolution of RNA-seq data.通过 RNA-seq 数据解析揭示的肿瘤免疫微环境的分子和药理学调节剂。

Genome Med. 2019 May 24;11(1):34. doi: 10.1186/s13073-019-0638-6.

Single-Cell RNA-Seq Technologies and Related Computational Data Analysis.单细胞RNA测序技术及相关计算数据分析

Front Genet. 2019 Apr 5;10:317. doi: 10.3389/fgene.2019.00317. eCollection 2019.

Systematic benchmarking of omics computational tools.系统生物学计算工具的基准测试。

Nat Commun. 2019 Mar 27;10(1):1393. doi: 10.1038/s41467-019-09406-4.

Understanding tumor ecosystems by single-cell sequencing: promises and limitations.单细胞测序解析肿瘤生态系统：机遇与挑战并存。

Genome Biol. 2018 Dec 3;19(1):211. doi: 10.1186/s13059-018-1593-z.

Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases.利用多个数据集之间的异质性可以提高细胞混合物反卷积的准确性，并减少生物和技术偏差。

Nat Commun. 2018 Nov 9;9(1):4735. doi: 10.1038/s41467-018-07242-6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验