Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States.
Department of Biostatistics and Data Science, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States.
Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae537.
The growing number of single-cell RNA-seq (scRNA-seq) studies highlights the potential benefits of integrating multiple datasets, such as augmenting sample sizes and enhancing analytical robustness. Inherent diversity and batch discrepancies within samples or across studies continue to pose significant challenges for computational analyses. Questions persist in practice, lacking definitive answers: Should we use a specific integration method or opt for simply merging the datasets during joint analysis? Among all the existing data integration methods, which one is more suitable in specific scenarios?
To fill the gap, we introduce SCIntRuler, a novel statistical metric for guiding the integration of multiple scRNA-seq datasets. SCIntRuler helps researchers make informed decisions regarding the necessity of data integration and the selection of an appropriate integration method. Our simulations and real data applications demonstrate that SCIntRuler streamlines decision-making processes and facilitates the analysis of diverse scRNA-seq datasets under varying contexts, thereby alleviating the complexities associated with the integration of heterogeneous scRNA-seq datasets.
The implementation of our method is available on CRAN as an open-source R package with a user-friendly manual available: https://cloud.r-project.org/web/packages/SCIntRuler/index.html.
越来越多的单细胞 RNA 测序 (scRNA-seq) 研究凸显了整合多个数据集的潜在优势,例如增加样本量和提高分析稳健性。样本内或研究间的固有多样性和批次差异仍然对计算分析构成重大挑战。在实践中存在一些问题,缺乏明确的答案:我们应该使用特定的整合方法,还是在联合分析时选择简单地合并数据集?在所有现有的数据整合方法中,在特定情况下哪一种更适用?
为了填补这一空白,我们引入了 SCIntRuler,这是一种用于指导多个 scRNA-seq 数据集整合的新统计指标。SCIntRuler 帮助研究人员就数据整合的必要性和选择合适的整合方法做出明智的决策。我们的模拟和真实数据应用表明,SCIntRuler 简化了决策过程,并促进了在不同背景下分析多样化的 scRNA-seq 数据集,从而缓解了整合异构 scRNA-seq 数据集的复杂性。
我们的方法的实现可在 CRAN 上作为开源 R 包获得,并提供用户友好的手册:https://cloud.r-project.org/web/packages/SCIntRuler/index.html。