Rautenstrauch Pia, Ohler Uwe
Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Berlin, Germany.
Humboldt-Universität zu Berlin, Department of Computer Science, Berlin, Germany.
Nat Biotechnol. 2025 Jul 30. doi: 10.1038/s41587-025-02743-4.
Single-cell studies rely on advanced integration methods for complex datasets affected by batch effects from technical factors alongside meaningful biological variation. Silhouette is an established metric for assessing unsupervised clustering results, comparing within-cluster cohesion to between-cluster separation. However, silhouette's assumptions are typically violated in single-cell data integration scenarios. We demonstrate that silhouette-based metrics cannot reliably assess batch effect removal or biological signal conservation and propose more robust evaluation strategies.
单细胞研究依赖于先进的整合方法来处理受技术因素产生的批次效应以及有意义的生物学变异影响的复杂数据集。轮廓系数是一种既定的指标,用于评估无监督聚类结果,比较簇内凝聚性和簇间分离度。然而,在单细胞数据整合场景中,轮廓系数的假设通常会被违反。我们证明基于轮廓系数的指标不能可靠地评估批次效应的消除或生物信号的保留,并提出了更稳健的评估策略。