Houdjedj Aissa, Marouf Yacine, Myradov Mekan, Doğan Süleyman Onur, Erten Burak Onur, Tastan Oznur, Erten Cesim, Kazan Hilal
Antalya Bilim University, 07190, Antalya, Turkey.
Akdeniz University, 07058, Antalya, Turkey.
BMC Bioinformatics. 2025 Mar 27;26(1):92. doi: 10.1186/s12859-025-06087-3.
As single-cell genomics experiments increase in complexity and scale, the need to integrate multiple datasets has grown. Such integration enhances cellular feature identification by leveraging larger data volumes. However, batch effects-technical variations arising from differences in labs, times, or protocols-pose a significant challenge. Despite numerous proposed batch correction methods, many still have limitations, such as outputting only dimension-reduced data, relying on computationally intensive models, or resulting in overcorrection for batches with diverse cell type composition.
We introduce a novel method for batch effect correction named SCITUNA, a Single-Cell data Integration Tool Using Network Alignment. We perform evaluations on 39 individual batches from four real datasets and a simulated dataset, which include both scRNA-seq and scATAC-seq datasets, spanning multiple organisms and tissues. A thorough comparison of existing batch correction methods using 13 metrics reveals that SCITUNA outperforms current approaches and is successful at preserving biological signals present in the original data. In particular, SCITUNA shows a better performance than the current methods in all the comparisons except for the multiple batch integration of the lung dataset where the difference is 0.004.
SCITUNA effectively removes batch effects while retaining the biological signals present in the data. Our extensive experiments reveal that SCITUNA will be a valuable tool for diverse integration tasks.
随着单细胞基因组学实验的复杂性和规模不断增加,整合多个数据集的需求也日益增长。这种整合通过利用更大的数据量来增强细胞特征识别。然而,批次效应——由实验室、时间或实验方案的差异引起的技术变化——构成了重大挑战。尽管提出了许多批次校正方法,但许多方法仍然存在局限性,例如仅输出降维后的数据、依赖计算密集型模型,或者对具有不同细胞类型组成的批次进行过度校正。
我们引入了一种名为SCITUNA的新型批次效应校正方法,即使用网络比对的单细胞数据整合工具。我们对来自四个真实数据集和一个模拟数据集的39个单独批次进行了评估,这些数据集包括scRNA-seq和scATAC-seq数据集,涵盖了多种生物和组织。使用13个指标对现有批次校正方法进行的全面比较表明,SCITUNA优于当前方法,并且成功地保留了原始数据中存在的生物信号。特别是,除了肺数据集的多批次整合(差异为0.004)外,SCITUNA在所有比较中均表现出比当前方法更好的性能。
SCITUNA有效地消除了批次效应,同时保留了数据中存在的生物信号。我们的大量实验表明,SCITUNA将成为各种整合任务的有价值工具。