Willem Theresa, Shitov Vladimir A, Luecken Malte D, Kilbertus Niki, Bauer Stefan, Piraud Marie, Buyx Alena, Theis Fabian J
TUM School for Medicine and Health, Institute of History and Ethics in Medicine, Technical University of Munich, Munich, Germany.
Helmholtz Munich, Munich, Germany.
Nat Cell Biol. 2025 Mar;27(3):384-392. doi: 10.1038/s41556-025-01619-8. Epub 2025 Feb 19.
Recent machine-learning (ML)-based advances in single-cell data science have enabled the stratification of human tissue donors at single-cell resolution, promising to provide valuable diagnostic and prognostic insights. However, such insights are susceptible to biases. Here we discuss various biases that emerge along the pipeline of ML-based single-cell analysis, ranging from societal biases affecting whose samples are collected, to clinical and cohort biases that influence the generalizability of single-cell datasets, biases stemming from single-cell sequencing, ML biases specific to (weakly supervised or unsupervised) ML models trained on human single-cell samples and biases during the interpretation of results from ML models. We end by providing methods for single-cell data scientists to assess and mitigate biases, and call for efforts to address the root causes of biases.
近期基于机器学习(ML)的单细胞数据科学进展已能够在单细胞分辨率下对人类组织供体进行分层,有望提供有价值的诊断和预后见解。然而,这些见解容易受到偏差的影响。在这里,我们讨论了基于ML的单细胞分析流程中出现的各种偏差,从影响样本收集对象的社会偏差,到影响单细胞数据集通用性的临床和队列偏差,单细胞测序产生的偏差,在人类单细胞样本上训练的(弱监督或无监督)ML模型特有的ML偏差,以及ML模型结果解释过程中的偏差。我们最后提供了单细胞数据科学家评估和减轻偏差的方法,并呼吁努力解决偏差的根本原因。