Xiong Yi-Xuan, Zhang Xiao-Fei
School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China.
Key Laboratory of Nonlinear Analysis & Applications (Ministry of Education), Central China Normal University, Wuhan 430079, China.
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae072.
The proliferation of single-cell RNA-seq data has greatly enhanced our ability to comprehend the intricate nature of diverse tissues. However, accurately annotating cell types in such data, especially when handling multiple reference datasets and identifying novel cell types, remains a significant challenge. To address these issues, we introduce Single Cell annotation based on Distance metric learning and Optimal Transport (scDOT), an innovative cell-type annotation method adept at integrating multiple reference datasets and uncovering previously unseen cell types. scDOT introduces two key innovations. First, by incorporating distance metric learning and optimal transport, it presents a novel optimization framework. This framework effectively learns the predictive power of each reference dataset for new query data and simultaneously establishes a probabilistic mapping between cells in the query data and reference-defined cell types. Secondly, scDOT develops an interpretable scoring system based on the acquired probabilistic mapping, enabling the precise identification of previously unseen cell types within the data. To rigorously assess scDOT's capabilities, we systematically evaluate its performance using two diverse collections of benchmark datasets encompassing various tissues, sequencing technologies and diverse cell types. Our experimental results consistently affirm the superior performance of scDOT in cell-type annotation and the identification of previously unseen cell types. These advancements provide researchers with a potent tool for precise cell-type annotation, ultimately enriching our understanding of complex biological tissues.
单细胞RNA测序数据的激增极大地提高了我们理解不同组织复杂本质的能力。然而,在此类数据中准确注释细胞类型,尤其是在处理多个参考数据集和识别新细胞类型时,仍然是一项重大挑战。为了解决这些问题,我们引入了基于距离度量学习和最优传输的单细胞注释(scDOT),这是一种创新的细胞类型注释方法,擅长整合多个参考数据集并发现以前未见过的细胞类型。scDOT引入了两项关键创新。首先,通过结合距离度量学习和最优传输,它提出了一个新颖的优化框架。该框架有效地学习每个参考数据集对新查询数据的预测能力,并同时在查询数据中的细胞与参考定义的细胞类型之间建立概率映射。其次,scDOT基于获得的概率映射开发了一个可解释的评分系统,能够在数据中精确识别以前未见过的细胞类型。为了严格评估scDOT的能力,我们使用包含各种组织、测序技术和不同细胞类型的两个不同的基准数据集集合系统地评估其性能。我们的实验结果一致肯定了scDOT在细胞类型注释和识别以前未见过的细胞类型方面的卓越性能。这些进展为研究人员提供了一个强大的工具,用于精确的细胞类型注释,最终丰富我们对复杂生物组织的理解。