Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, USA.
Genome Center, University of California, Davis, CA, USA.
Nat Commun. 2024 Nov 15;15(1):9932. doi: 10.1038/s41467-024-53971-2.
Multimodal single-cell assays profile multiple sets of features in the same cells and are widely used for identifying and mapping cell states between chromatin and mRNA and linking regulatory elements to target genes. However, the high dimensionality of input features and shallow sequencing depth compared to unimodal assays pose challenges in data analysis. Here we present scPair, a multimodal single-cell data framework that overcomes these challenges by employing an implicit feature selection approach. scPair uses dual encoder-decoder structures trained on paired data to align cell states across modalities and predict features from one modality to another. We demonstrate that scPair outperforms existing methods in accuracy and execution time, and facilitates downstream tasks such as trajectory inference. We further show scPair can augment smaller multimodal datasets with larger unimodal atlases to increase statistical power to identify groups of transcription factors active during different stages of neural differentiation.
多模态单细胞分析技术可以在同一细胞中同时分析多组特征,广泛应用于鉴定和绘制染色质和 mRNA 之间的细胞状态,并将调控元件与靶基因联系起来。然而,与单模态分析相比,输入特征的高维性和浅层测序深度在数据分析方面带来了挑战。在这里,我们提出了 scPair,这一多模态单细胞数据分析框架通过采用一种隐式特征选择方法来克服这些挑战。scPair 使用在配对数据上训练的双编码器-解码器结构来跨模态对齐细胞状态,并预测来自一种模态的特征到另一种模态。我们证明了 scPair 在准确性和执行时间方面优于现有方法,并促进了下游任务,如轨迹推断。我们还表明,scPair 可以使用更大的单模态图谱来扩充较小的多模态数据集,以增加统计能力,从而识别在神经分化的不同阶段活跃的转录因子组。