Wei Xin, Ma Wenjing, Wu Zhijin, Wu Hao
Department of Biostatistics, Brown University, Providence, USA.
Department of Biostatistics, University of Michigan-Ann Arbor, Ann Arbor, USA.
Genome Biol. 2025 Jun 10;26(1):157. doi: 10.1186/s13059-025-03614-6.
Cell-type identification is a crucial step in single cell RNA-seq (scRNA-seq) data analysis, for which supervised methods are preferred due to their accuracy and efficiency. Performance is highly dependent on the quality of the reference data, but there is no method for selecting and constructing reference data. We develop Target-Oriented Reference Construction (TORC), a widely applicable strategy for constructing reference data given a target dataset for scRNA-seq supervised cell-type identification. TORC alleviates the differences in data distribution and cell-type composition between reference and target. Extensive benchmarks on simulated and real data analyses demonstrate consistent improvements in cell-type identification from TORC.
细胞类型识别是单细胞RNA测序(scRNA-seq)数据分析中的关键步骤,由于其准确性和效率,监督方法在此过程中更受青睐。性能高度依赖于参考数据的质量,但目前尚无选择和构建参考数据的方法。我们开发了面向目标的参考构建(TORC)方法,这是一种广泛适用的策略,可针对scRNA-seq监督细胞类型识别的目标数据集构建参考数据。TORC减少了参考数据与目标数据在数据分布和细胞类型组成上的差异。在模拟和实际数据分析上进行的大量基准测试表明,TORC在细胞类型识别方面持续改进。