Department of Statistics, University of California, Los Angeles, 90095-1554, CA, USA.
Bioinformatics Interdepartmental Ph.D. Program, University of California, Los Angeles, 90095-7246, CA, USA.
Genome Biol. 2022 Jan 21;23(1):31. doi: 10.1186/s13059-022-02601-5.
Researchers view vast zeros in single-cell RNA-seq data differently: some regard zeros as biological signals representing no or low gene expression, while others regard zeros as missing data to be corrected. To help address the controversy, here we discuss the sources of biological and non-biological zeros; introduce five mechanisms of adding non-biological zeros in computational benchmarking; evaluate the impacts of non-biological zeros on data analysis; benchmark three input data types: observed counts, imputed counts, and binarized counts; discuss the open questions regarding non-biological zeros; and advocate the importance of transparent analysis.
研究人员对单细胞 RNA-seq 数据中的大量零值有不同的看法:一些人将零值视为代表无或低基因表达的生物学信号,而另一些人则将零值视为缺失数据进行校正。为帮助解决争议,我们在此讨论了生物学和非生物学零值的来源;介绍了在计算基准测试中添加非生物学零值的五种机制;评估了非生物学零值对数据分析的影响;基准测试了三种输入数据类型:观测计数、推断计数和二值化计数;讨论了关于非生物学零值的悬而未决的问题;并倡导透明分析的重要性。
Genome Biol. 2022-1-21
J Chem Inf Model. 2025-3-10
Biostatistics. 2018-10-1
BMC Bioinformatics. 2022-6-17
RNA Biol. 2021-10-15
Bioinformatics. 2018-9-15
Brief Bioinform. 2021-7-20
Bioengineering (Basel). 2025-7-31
Genomics Inform. 2025-5-17
Nat Commun. 2025-4-16
Genome Biol. 2021-10-11
Comput Struct Biotechnol J. 2020-9-28
Cell Syst. 2020-9-23
Genome Biol. 2020-8-27
Genome Biol. 2020-8-6