• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于UMI计数的单细胞RNA测序差异表达分析中处理批次效应方法的比较。

A comparison of methods accounting for batch effects in differential expression analysis of UMI count based single cell RNA sequencing.

作者信息

Chen Wenan, Zhang Silu, Williams Justin, Ju Bensheng, Shaner Bridget, Easton John, Wu Gang, Chen Xiang

机构信息

Center for Applied Bioinformatics, St. Jude Children's Research Hospital, Memphis, TN, United States.

Department of Diagnostic Imaging, St. Jude Children's Research Hospital, Memphis, TN, United States.

出版信息

Comput Struct Biotechnol J. 2020 Mar 30;18:861-873. doi: 10.1016/j.csbj.2020.03.026. eCollection 2020.

DOI:10.1016/j.csbj.2020.03.026
PMID:32322368
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7163294/
Abstract

Accounting for batch effects, especially latent batch effects, in differential expression (DE) analysis is critical for identifying true biological effects. Single-cell RNA sequencing (scRNA-seq) is a powerful tool for quantifying cell-to-cell variation in transcript abundance and characterizing cellular dynamics. Although many scRNA-seq DE analysis methods accommodate known batch variables, their performance has not been systematically evaluated. Moreover, the challenge of accounting for latent batch variables in scRNA-seq DE analysis is largely unmet. In contrast, many methods have been developed to account for batch variables (either known or latent) in other high-dimensional data, especially bulk RNA-seq. We extensively evaluate 11 methods for batch variables in different scRNA-seq DE analysis scenarios, with a primary focus on latent batch variables. We demonstrate that for known batch variables, incorporating them as covariates into a regression model outperformed approaches using a batch-corrected matrix. For latent batches, fixed effects models have inflated FDRs, whereas aggregation-based methods and mixed effects models have significant power loss. Surrogate variable based methods generally control the FDR well while achieving good power with small group effects. However, their performance (except that of SVA) deteriorated substantially in scenarios involving large group effects and/or group label impurity. In these settings, SVA achieves relatively good performance despite an occasionally inflated FDR (up to 0.2). Finally we make the following recommendations for scRNA-seq DE analysis: 1) incorporate known batch variables instead of using batch-corrected data; and 2) employ SVA for latent batch correction. However, better methods are still needed to fully unleash the power of scRNA-seq.

摘要

在差异表达(DE)分析中考虑批次效应,尤其是潜在批次效应,对于识别真正的生物学效应至关重要。单细胞RNA测序(scRNA-seq)是一种强大的工具,可用于量化转录本丰度中的细胞间差异并表征细胞动态。尽管许多scRNA-seq DE分析方法考虑了已知的批次变量,但其性能尚未得到系统评估。此外,在scRNA-seq DE分析中考虑潜在批次变量的挑战在很大程度上尚未得到解决。相比之下,已经开发了许多方法来考虑其他高维数据中的批次变量(已知或潜在),尤其是批量RNA-seq。我们在不同的scRNA-seq DE分析场景中广泛评估了11种处理批次变量的方法,主要关注潜在批次变量。我们证明,对于已知的批次变量,将它们作为协变量纳入回归模型的方法优于使用批次校正矩阵的方法。对于潜在批次,固定效应模型的错误发现率(FDR)过高,而基于聚合的方法和混合效应模型则存在显著的功效损失。基于替代变量的方法通常能很好地控制FDR,同时在小组效应下具有良好的功效。然而,在涉及大组效应和/或组标签不纯的场景中,它们的性能(SVA除外)会大幅下降。在这些情况下,尽管SVA偶尔会出现过高的FDR(高达0.2),但其性能相对较好。最后,我们对scRNA-seq DE分析提出以下建议:1)纳入已知的批次变量,而不是使用批次校正后的数据;2)采用SVA进行潜在批次校正。然而,仍需要更好的方法来充分发挥scRNA-seq的功效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0867/7163294/0a18f59b6944/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0867/7163294/62f57a9e6247/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0867/7163294/a875331ecd09/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0867/7163294/29f3ec344a51/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0867/7163294/c8bc172bc02f/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0867/7163294/d464133aa5ec/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0867/7163294/0a18f59b6944/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0867/7163294/62f57a9e6247/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0867/7163294/a875331ecd09/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0867/7163294/29f3ec344a51/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0867/7163294/c8bc172bc02f/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0867/7163294/d464133aa5ec/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0867/7163294/0a18f59b6944/gr5.jpg

相似文献

1
A comparison of methods accounting for batch effects in differential expression analysis of UMI count based single cell RNA sequencing.基于UMI计数的单细胞RNA测序差异表达分析中处理批次效应方法的比较。
Comput Struct Biotechnol J. 2020 Mar 30;18:861-873. doi: 10.1016/j.csbj.2020.03.026. eCollection 2020.
2
Propensity score matching enables batch-effect-corrected imputation in single-cell RNA-seq analysis.倾向得分匹配可实现单细胞 RNA-seq 分析中的批次效应校正填补。
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac275.
3
V-SVA: an R Shiny application for detecting and annotating hidden sources of variation in single-cell RNA-seq data.V-SVA:一个用于检测和注释单细胞 RNA-seq 数据中隐藏变异源的 R Shiny 应用程序。
Bioinformatics. 2020 Jun 1;36(11):3582-3584. doi: 10.1093/bioinformatics/btaa128.
4
Detecting differential alternative splicing events in scRNA-seq with or without Unique Molecular Identifiers.在有或没有独特分子标识符的 scRNA-seq 中检测差异剪接事件。
PLoS Comput Biol. 2020 Jun 5;16(6):e1007925. doi: 10.1371/journal.pcbi.1007925. eCollection 2020 Jun.
5
Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data.潜伏细胞分析能稳健地揭示大规模单细胞 RNA-seq 数据中的细微多样性。
Nucleic Acids Res. 2019 Dec 16;47(22):e143. doi: 10.1093/nar/gkz826.
6
Analysis of single-cell RNA sequencing data based on autoencoders.基于自动编码器的单细胞 RNA 测序数据分析。
BMC Bioinformatics. 2021 Jun 8;22(1):309. doi: 10.1186/s12859-021-04150-3.
7
Differential gene expression analysis for multi-subject single-cell RNA-sequencing studies with aggregateBioVar.使用aggregateBioVar对多主体单细胞RNA测序研究进行差异基因表达分析。
Bioinformatics. 2021 Oct 11;37(19):3243-3251. doi: 10.1093/bioinformatics/btab337.
8
Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.通过匹配相互最近邻,纠正单细胞 RNA 测序数据中的批次效应。
Nat Biotechnol. 2018 Jun;36(5):421-427. doi: 10.1038/nbt.4091. Epub 2018 Apr 2.
9
Recommendations of scRNA-seq Differential Gene Expression Analysis Based on Comprehensive Benchmarking.基于全面基准测试的单细胞RNA测序差异基因表达分析建议
Life (Basel). 2022 Jun 7;12(6):850. doi: 10.3390/life12060850.
10
iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjects.iDESC:基于多个样本的单细胞 RNA 测序数据差异表达识别。
BMC Bioinformatics. 2023 Aug 22;24(1):318. doi: 10.1186/s12859-023-05432-8.

引用本文的文献

1
cytoKernel: robust kernel embeddings for assessing differential expression of single-cell data.细胞内核:用于评估单细胞数据差异表达的稳健内核嵌入
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf399.
2
cytoKernel: Robust kernel embeddings for assessing differential expression of single cell data.细胞内核:用于评估单细胞数据差异表达的稳健内核嵌入
bioRxiv. 2024 Aug 19:2024.08.16.608287. doi: 10.1101/2024.08.16.608287.
3
Patterns of Unwanted Biological and Technical Expression Variation Among 49 Human Tissues.49 个人体组织中生物和技术表达变异的非期望模式。

本文引用的文献

1
Estimating and accounting for unobserved covariates in high-dimensional correlated data.估计和考虑高维相关数据中未观测到的协变量。
J Am Stat Assoc. 2022;117(537):225-236. doi: 10.1080/01621459.2020.1769635. Epub 2020 Jun 30.
2
A benchmark of batch-effect correction methods for single-cell RNA sequencing data.单细胞 RNA 测序数据批次效应校正方法的基准测试。
Genome Biol. 2020 Jan 16;21(1):12. doi: 10.1186/s13059-019-1850-9.
3
Accounting for unobserved covariates with varying degrees of estimability in high-dimensional biological data.
Lab Invest. 2024 Jun;104(6):102069. doi: 10.1016/j.labinv.2024.102069. Epub 2024 Apr 24.
4
A Review of Single-Cell RNA-Seq Annotation, Integration, and Cell-Cell Communication.单细胞 RNA-Seq 注释、整合和细胞间通讯综述。
Cells. 2023 Jul 30;12(15):1970. doi: 10.3390/cells12151970.
5
Genome-wide mapping of cancer dependency genes and genetic modifiers of chemotherapy in high-risk hepatoblastoma.高危肝母细胞瘤中癌症相关基因和化疗遗传修饰体的全基因组图谱绘制。
Nat Commun. 2023 Jul 6;14(1):4003. doi: 10.1038/s41467-023-39717-6.
6
Single-cell transcriptomics reveal a hyperacute cytokine and immune checkpoint axis after cardiac arrest in patients with poor neurological outcome.单细胞转录组学揭示了神经功能预后不良的心脏骤停患者中存在超急性细胞因子和免疫检查点轴。
Med. 2023 Jul 14;4(7):432-456.e6. doi: 10.1016/j.medj.2023.05.003. Epub 2023 May 30.
7
The shaky foundations of simulating single-cell RNA sequencing data.模拟单细胞 RNA 测序数据的不稳固基础。
Genome Biol. 2023 Mar 29;24(1):62. doi: 10.1186/s13059-023-02904-1.
8
A Framework of Analysis to Facilitate the Harmonization of Multicenter Radiomic Features in Prostate Cancer.促进前列腺癌多中心放射组学特征协调统一的分析框架
J Clin Med. 2022 Dec 24;12(1):140. doi: 10.3390/jcm12010140.
9
Cryobanking of Human Distal Lung Epithelial Cells for Preservation of Their Phenotypic and Functional Characteristics.人类远端肺上皮细胞的低温保存库用于保持其表型和功能特征。
Am J Respir Cell Mol Biol. 2022 Dec;67(6):623-631. doi: 10.1165/rcmb.2021-0507MA.
10
SAREV: A review on statistical analytics of single-cell RNA sequencing data.SAREV:单细胞RNA测序数据的统计分析综述
Wiley Interdiscip Rev Comput Stat. 2022 Jul-Aug;14(4). doi: 10.1002/wics.1558. Epub 2021 May 20.
在高维生物数据中考虑具有不同可估计程度的未观测协变量。
Biometrika. 2019 Dec;106(4):823-840. doi: 10.1093/biomet/asz037. Epub 2019 Sep 16.
4
Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data.潜伏细胞分析能稳健地揭示大规模单细胞 RNA-seq 数据中的细微多样性。
Nucleic Acids Res. 2019 Dec 16;47(22):e143. doi: 10.1093/nar/gkz826.
5
Valid Post-clustering Differential Analysis for Single-Cell RNA-Seq.单细胞 RNA-Seq 的有效聚类后差异分析。
Cell Syst. 2019 Oct 23;9(4):383-392.e6. doi: 10.1016/j.cels.2019.07.012. Epub 2019 Sep 11.
6
CONFOUNDER ADJUSTMENT IN MULTIPLE HYPOTHESIS TESTING.多重假设检验中的混杂因素调整
Ann Stat. 2017 Oct;45(5):1863-1894. doi: 10.1214/16-AOS1511. Epub 2017 Oct 31.
7
scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets.scMerge 通过因子分析、稳定表达和伪复制来合并多个单细胞 RNA-seq 数据集。
Proc Natl Acad Sci U S A. 2019 May 14;116(20):9775-9784. doi: 10.1073/pnas.1820006116. Epub 2019 Apr 26.
8
Performance Assessment and Selection of Normalization Procedures for Single-Cell RNA-Seq.单细胞 RNA-Seq 数据标准化流程的性能评估与选择
Cell Syst. 2019 Apr 24;8(4):315-328.e8. doi: 10.1016/j.cels.2019.03.010.
9
Challenges in unsupervised clustering of single-cell RNA-seq data.无监督单细胞 RNA-seq 数据聚类的挑战。
Nat Rev Genet. 2019 May;20(5):273-282. doi: 10.1038/s41576-018-0088-9.
10
Single-cell RNA sequencing technologies and bioinformatics pipelines.单细胞 RNA 测序技术和生物信息学分析流程。
Exp Mol Med. 2018 Aug 7;50(8):1-14. doi: 10.1038/s12276-018-0071-8.