文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

单细胞 RNA 测序实验中的数据缺失和技术变异性。

Missing data and technical variability in single-cell RNA-sequencing experiments.

机构信息

Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA.

Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA.

出版信息

Biostatistics. 2018 Oct 1;19(4):562-578. doi: 10.1093/biostatistics/kxx053.


DOI:10.1093/biostatistics/kxx053
PMID:29121214
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6215955/
Abstract

Until recently, high-throughput gene expression technology, such as RNA-Sequencing (RNA-seq) required hundreds of thousands of cells to produce reliable measurements. Recent technical advances permit genome-wide gene expression measurement at the single-cell level. Single-cell RNA-Seq (scRNA-seq) is the most widely used and numerous publications are based on data produced with this technology. However, RNA-seq and scRNA-seq data are markedly different. In particular, unlike RNA-seq, the majority of reported expression levels in scRNA-seq are zeros, which could be either biologically-driven, genes not expressing RNA at the time of measurement, or technically-driven, genes expressing RNA, but not at a sufficient level to be detected by sequencing technology. Another difference is that the proportion of genes reporting the expression level to be zero varies substantially across single cells compared to RNA-seq samples. However, it remains unclear to what extent this cell-to-cell variation is being driven by technical rather than biological variation. Furthermore, while systematic errors, including batch effects, have been widely reported as a major challenge in high-throughput technologies, these issues have received minimal attention in published studies based on scRNA-seq technology. Here, we use an assessment experiment to examine data from published studies and demonstrate that systematic errors can explain a substantial percentage of observed cell-to-cell expression variability. Specifically, we present evidence that some of these reported zeros are driven by technical variation by demonstrating that scRNA-seq produces more zeros than expected and that this bias is greater for lower expressed genes. In addition, this missing data problem is exacerbated by the fact that this technical variation varies cell-to-cell. Then, we show how this technical cell-to-cell variability can be confused with novel biological results. Finally, we demonstrate and discuss how batch-effects and confounded experiments can intensify the problem.

摘要

直到最近,高通量基因表达技术,如 RNA 测序(RNA-seq),需要数十万的细胞才能产生可靠的测量结果。最近的技术进步使得在单细胞水平上进行全基因组基因表达测量成为可能。单细胞 RNA-seq(scRNA-seq)是最广泛使用的技术,并且有许多出版物都是基于该技术产生的数据。然而,RNA-seq 和 scRNA-seq 数据有显著的不同。特别是,与 RNA-seq 不同的是,scRNA-seq 中大多数报告的表达水平都是零,这可能是由生物驱动的,即在测量时基因不表达 RNA,也可能是由技术驱动的,即基因表达 RNA,但测序技术检测不到足够的水平。另一个区别是,与 RNA-seq 样本相比,报告表达水平为零的基因在单细胞中的比例有很大的差异。然而,目前还不清楚这种细胞间的差异在多大程度上是由技术而不是生物变异驱动的。此外,虽然系统误差,包括批次效应,已被广泛报道为高通量技术的主要挑战,但在基于 scRNA-seq 技术的已发表研究中,这些问题几乎没有得到关注。在这里,我们使用评估实验来检查已发表研究的数据,并证明系统误差可以解释观察到的细胞间表达变异性的很大一部分。具体来说,我们通过证明 scRNA-seq 产生的零比预期的多,并且这种偏差在低表达基因中更大,证明了一些报告的零是由技术变异驱动的,从而提供了证据。此外,由于这种技术变异在细胞间存在差异,因此这个缺失数据问题更加严重。然后,我们展示了这种技术细胞间的可变性如何与新的生物学结果混淆。最后,我们展示并讨论了批次效应和混淆实验如何加剧这个问题。

相似文献

[1]
Missing data and technical variability in single-cell RNA-sequencing experiments.

Biostatistics. 2018-10-1

[2]
scNPF: an integrative framework assisted by network propagation and network fusion for preprocessing of single-cell RNA-seq data.

BMC Genomics. 2019-5-8

[3]
Detection of high variability in gene expression from single-cell RNA-seq profiling.

BMC Genomics. 2016-8-22

[4]
Normalization of Single-Cell RNA-Seq Data.

Methods Mol Biol. 2021

[5]
Microfluidic single-cell whole-transcriptome sequencing.

Proc Natl Acad Sci U S A. 2014-4-29

[6]
Analysis of Technical and Biological Variability in Single-Cell RNA Sequencing.

Methods Mol Biol. 2019

[7]
No detectable alloreactive transcriptional responses under standard sample preparation conditions during donor-multiplexed single-cell RNA sequencing of peripheral blood mononuclear cells.

BMC Biol. 2021-1-20

[8]
SCnorm: robust normalization of single-cell RNA-seq data.

Nat Methods. 2017-6

[9]
Quality Control of Single-Cell RNA-seq.

Methods Mol Biol. 2019

[10]
A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa.

PLoS Comput Biol. 2018-4-9

引用本文的文献

[1]
A Benchmark of Semi-Supervised scRNA-seq Integration Methods in Real-World Scenarios.

bioRxiv. 2025-8-27

[2]
Missing data in single-cell transcriptomes reveals transcriptional shifts.

bioRxiv. 2025-8-21

[3]
Single-cell multi-omics in cancer immunotherapy: from tumor heterogeneity to personalized precision treatment.

Mol Cancer. 2025-8-25

[4]
BioLLM: A standardized framework for integrating and benchmarking single-cell foundation models.

Patterns (N Y). 2025-7-30

[5]
RESCUE: recovery of idiosyncratic expression patterns in spatial transcriptomics.

bioRxiv. 2025-8-15

[6]
Biomaterial-mediated Cell Atlas: an insight from single-cell and spatial transcriptomics.

Bioact Mater. 2025-8-8

[7]
Simulating paired and longitudinal single-cell RNA sequencing data with rescueSim.

Bioinformatics. 2025-8-2

[8]
Discordant effects of maternal age on the human MII oocyte transcriptome.

Mol Hum Reprod. 2025-7-3

[9]
Benchmarking of computational demultiplexing methods for single-nucleus RNA sequencing data.

Brief Bioinform. 2025-7-2

[10]
Critical gene network and signaling pathway analysis of the extracellular signal-regulated kinase (ERK) pathway in ischemic stroke.

Front Mol Neurosci. 2025-6-25

本文引用的文献

[1]
A UNIFIED STATISTICAL FRAMEWORK FOR SINGLE CELL AND BULK RNA SEQUENCING DATA.

Ann Appl Stat. 2018-3

[2]
Normalizing single-cell RNA sequencing data: challenges and opportunities.

Nat Methods. 2017-6

[3]
Power analysis of single-cell RNA-sequencing experiments.

Nat Methods. 2017-3-6

[4]
Comparative Analysis of Single-Cell RNA Sequencing Methods.

Mol Cell. 2017-2-16

[5]
Massively parallel digital transcriptional profiling of single cells.

Nat Commun. 2017-1-16

[6]
Batch effects and the effective design of single-cell gene expression studies.

Sci Rep. 2017-1-3

[7]
The UCSC Genome Browser database: 2017 update.

Nucleic Acids Res. 2017-1-4

[8]
A statistical approach for identifying differential distributions in single-cell RNA-seq experiments.

Genome Biol. 2016-10-25

[9]
Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation.

Nat Biotechnol. 2016-12

[10]
Pooling across cells to normalize single-cell RNA sequencing data with many zero counts.

Genome Biol. 2016-4-27

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索