Li Ruixiang, Shi Fangyuan, Song Lijuan, Yu Zhenhua
School of Information Engineering, Ningxia University, Yinchuan, 750021, China.
Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, Yinchuan, 750021, China.
BMC Genomics. 2024 Apr 22;25(1):393. doi: 10.1186/s12864-024-10319-w.
Accurately deciphering clonal copy number substructure can provide insights into the evolutionary mechanism of cancer, and clustering single-cell copy number profiles has become an effective means to unmask intra-tumor heterogeneity (ITH). However, copy numbers inferred from single-cell DNA sequencing (scDNA-seq) data are error-prone due to technically confounding factors such as amplification bias and allele-dropout, and this makes it difficult to precisely identify the ITH.
We introduce a hybrid model called scGAL to infer clonal copy number substructure. It combines an autoencoder with a generative adversarial network to jointly analyze independent single-cell copy number profiles and gene expression data from same cell line. Under an adversarial learning framework, scGAL exploits complementary information from gene expression data to relieve the effects of noise in copy number data, and learns latent representations of scDNA-seq cells for accurate inference of the ITH. Evaluation results on three real cancer datasets suggest scGAL is able to accurately infer clonal architecture and surpasses other similar methods. In addition, assessment of scGAL on various simulated datasets demonstrates its high robustness against the changes of data size and distribution. scGAL can be accessed at: https://github.com/zhyu-lab/scgal .
Joint analysis of independent single-cell copy number and gene expression data from a same cell line can effectively exploit complementary information from individual omics, and thus gives more refined indication of clonal copy number substructure.
准确解读克隆拷贝数亚结构能够为癌症的进化机制提供见解,而对单细胞拷贝数图谱进行聚类已成为揭示肿瘤内异质性(ITH)的有效手段。然而,由于诸如扩增偏差和等位基因缺失等技术混杂因素,从单细胞DNA测序(scDNA-seq)数据推断出的拷贝数容易出错,这使得精确识别ITH变得困难。
我们引入了一种名为scGAL的混合模型来推断克隆拷贝数亚结构。它将自动编码器与生成对抗网络相结合,以联合分析来自同一细胞系的独立单细胞拷贝数图谱和基因表达数据。在对抗学习框架下,scGAL利用基因表达数据中的互补信息来减轻拷贝数数据中噪声的影响,并学习scDNA-seq细胞的潜在表示以准确推断ITH。在三个真实癌症数据集上的评估结果表明,scGAL能够准确推断克隆结构并超越其他类似方法。此外,在各种模拟数据集上对scGAL的评估证明了其对数据大小和分布变化具有高鲁棒性。可通过https://github.com/zhyu-lab/scgal访问scGAL。
对来自同一细胞系的独立单细胞拷贝数和基因表达数据进行联合分析能够有效利用个体组学中的互补信息,从而更精确地指示克隆拷贝数亚结构。