Department of Computer Science, University of Pittsburgh, Pittsburgh, Pennsylvania 15213, United States.
Department of Human Genetics, University of Pittsburgh, Pittsburgh, Pennsylvania 15213, United States.
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad512.
With the recent advent of single-cell level biological understanding, a growing interest is in identifying cell states or subtypes that are homogeneous in terms of gene expression and are also enriched in certain biological conditions, including disease samples versus normal samples (condition-specific cell subtype). Despite the importance of identifying condition-specific cell subtypes, existing methods have the following limitations: since they train models separately between gene expression and the biological condition information, (1) they do not consider potential interactions between them, and (2) the weights from both types of information are not properly controlled. Also, (3) they do not consider non-linear relationships in the gene expression and the biological condition. To address the limitations and accurately identify such condition-specific cell subtypes, we develop scDeepJointClust, the first method that jointly trains both types of information via a deep neural network. scDeepJointClust incorporates results from the power of state-of-the-art gene-expression-based clustering methods as an input, incorporating their sophistication and accuracy. We evaluated scDeepJointClust on both simulation data in diverse scenarios and biological data of different diseases (melanoma and non-small-cell lung cancer) and showed that scDeepJointClust outperforms existing methods in terms of sensitivity and specificity. scDeepJointClust exhibits significant promise in advancing our understanding of cellular states and their implications in complex biological systems.
随着单细胞水平生物学理解的最新出现,人们越来越感兴趣的是确定在基因表达方面同质的细胞状态或亚型,并且在某些生物学条件下也丰富,包括疾病样本与正常样本(条件特异性细胞亚型)。尽管确定条件特异性细胞亚型很重要,但现有的方法具有以下局限性:由于它们分别在基因表达和生物学条件信息之间训练模型,(1)它们不考虑它们之间的潜在相互作用,(2)两种类型的信息的权重都没有得到适当的控制。此外,(3)它们不考虑基因表达和生物学条件中的非线性关系。为了解决这些限制并准确识别这种条件特异性细胞亚型,我们开发了 scDeepJointClust,这是第一个通过深度神经网络联合训练这两种信息的方法。scDeepJointClust 将基于最先进的基因表达聚类方法的结果作为输入,结合其复杂性和准确性。我们在不同场景的模拟数据和不同疾病(黑色素瘤和非小细胞肺癌)的生物学数据上评估了 scDeepJointClust,并表明 scDeepJointClust 在灵敏度和特异性方面优于现有方法。scDeepJointClust 在推进我们对细胞状态及其在复杂生物系统中的影响的理解方面具有很大的潜力。