UT-ORNL Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37996, USA.
Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USA.
Bioinformatics. 2022 Jan 3;38(2):476-486. doi: 10.1093/bioinformatics/btab706.
Deep learning approaches have empowered single-cell omics data analysis in many ways and generated new insights from complex cellular systems. As there is an increasing need for single-cell omics data to be integrated across sources, types and features of data, the challenges of integrating single-cell omics data are rising. Here, we present an unsupervised deep learning algorithm that learns discriminative representations for single-cell data via maximizing mutual information, SMILE (Single-cell Mutual Information Learning).
Using a unique cell-pairing design, SMILE successfully integrates multisource single-cell transcriptome data, removing batch effects and projecting similar cell types, even from different tissues, into the shared space. SMILE can also integrate data from two or more modalities, such as joint-profiling technologies using single-cell ATAC-seq, RNA-seq, DNA methylation, Hi-C and ChIP data. When paired cells are known, SMILE can integrate data with unmatched feature, such as genes for RNA-seq and genome-wide peaks for ATAC-seq. Integrated representations learned from joint-profiling technologies can then be used as a framework for comparing independent single source data.
The source code of SMILE including analyses of key results in the study can be found at: https://github.com/rpmccordlab/SMILE, implemented in Python.
Supplementary data are available at Bioinformatics online.
深度学习方法在许多方面推动了单细胞组学数据分析,并从复杂的细胞系统中产生了新的见解。由于需要整合来自不同来源、类型和特征的数据的单细胞组学数据,因此整合单细胞组学数据的挑战也在增加。在这里,我们提出了一种无监督的深度学习算法,通过最大化互信息来学习单细胞数据的判别表示,即 SMILE(单细胞互信息学习)。
使用独特的细胞配对设计,SMILE 成功地整合了多源单细胞转录组数据,消除了批次效应,并将相似的细胞类型(即使来自不同的组织)投射到共享空间中。SMILE 还可以整合来自两种或更多模态的数据,例如使用单细胞 ATAC-seq、RNA-seq、DNA 甲基化、Hi-C 和 ChIP 数据的联合分析技术。当已知配对细胞时,SMILE 可以整合具有不匹配特征的数据,例如 RNA-seq 的基因和 ATAC-seq 的全基因组峰。然后,可以将联合分析技术中学习到的集成表示用作比较独立单源数据的框架。
SMILE 的源代码包括研究中关键结果的分析,可以在 https://github.com/rpmccordlab/SMILE 上找到,它是用 Python 实现的。
补充数据可在生物信息学在线获得。