Yang Tao, Yan Qiyu, Long Rongzhuo, Liu Zhixian, Wang Xiaosheng
Biomedical Informatics Research Lab, School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing 211198, China.
Cancer Genomics Research Center, School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing 211198, China.
Comput Struct Biotechnol J. 2023 Jul 11;21:3604-3614. doi: 10.1016/j.csbj.2023.07.009. eCollection 2023.
We propose PreCanCell, a novel algorithm for predicting malignant and non-malignant cells from single-cell transcriptomes. PreCanCell first identifies the differentially expressed genes (DEGs) between malignant and non-malignant cells commonly in five common cancer types-associated single-cell transcriptome datasets. The five common cancer types include renal cell carcinoma (RCC), head and neck squamous cell carcinoma (HNSCC), melanoma, lung adenocarcinoma (LUAD), and breast cancer (BC). With each of the five datasets as the training set and the DEGs as the features, a single cell is classified as malignant or non-malignant by -NN ( = 5). Finally, the single cell is determined as malignant or non-malignant by the majority vote of the five -NN classification results. We tested the predictive performance of PreCanCell in 19 single-cell datasets, and reported classification accuracy, sensitivity, specificity, balanced accuracy (the average of sensitivity and specificity) and the area under the receiver operating characteristic curve (AUROC). In all these datasets, PreCanCell achieved above 0.8 accuracy, sensitivity, specificity, balanced accuracy and AUROC. Finally, we compared the predictive performance of PreCanCell with that of seven other algorithms, including CHETAH, SciBet, SCINA, scmap-cell, scmap-cluster, SingleR, and ikarus. Compared to these algorithms, PreCanCell displays the advantages of higher accuracy and simpler implementation. We have developed an R package for the PreCanCell algorithm, which is available at https://github.com/WangX-Lab/PreCanCell.
我们提出了PreCanCell,这是一种用于从单细胞转录组预测恶性和非恶性细胞的新算法。PreCanCell首先在五个常见癌症类型相关的单细胞转录组数据集中,识别恶性和非恶性细胞之间的差异表达基因(DEG)。这五种常见癌症类型包括肾细胞癌(RCC)、头颈部鳞状细胞癌(HNSCC)、黑色素瘤、肺腺癌(LUAD)和乳腺癌(BC)。以五个数据集中的每一个作为训练集,以DEG作为特征,通过k-NN(k = 5)将单个细胞分类为恶性或非恶性。最后,通过五个k-NN分类结果的多数投票来确定单个细胞是恶性还是非恶性。我们在19个单细胞数据集中测试了PreCanCell的预测性能,并报告了分类准确率、敏感性、特异性、平衡准确率(敏感性和特异性的平均值)以及受试者工作特征曲线下面积(AUROC)。在所有这些数据集中,PreCanCell在准确率、敏感性、特异性、平衡准确率和AUROC方面均达到了0.8以上。最后,我们将PreCanCell的预测性能与其他七种算法进行了比较,包括CHETAH、SciBet、SCINA、scmap-cell、scmap-cluster、SingleR和ikarus。与这些算法相比,PreCanCell具有更高的准确率和更简单的实现方式的优势。我们已经为PreCanCell算法开发了一个R包,可在https://github.com/WangX-Lab/PreCanCell上获取。