Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, USA.
Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, USA.
BMC Bioinformatics. 2023 Jan 4;24(1):5. doi: 10.1186/s12859-022-05126-7.
Single-cell omics technology is rapidly developing to measure the epigenome, genome, and transcriptome across a range of cell types. However, it is still challenging to integrate omics data from different modalities. Here, we propose a variation of the Siamese neural network framework called MinNet, which is trained to integrate multi-omics data on the single-cell resolution by using graph-based contrastive loss.
By training the model and testing it on several benchmark datasets, we showed its accuracy and generalizability in integrating scRNA-seq with scATAC-seq, and scRNA-seq with epitope data. Further evaluation demonstrated our model's unique ability to remove the batch effect, a common problem in actual practice. To show how the integration impacts downstream analysis, we established model-based smoothing and cis-regulatory element-inferring method and validated it with external pcHi-C evidence. Finally, we applied the framework to a COVID-19 dataset to bolster the original work with integration-based analysis, showing its necessity in single-cell multi-omics research.
MinNet is a novel deep-learning framework for single-cell multi-omics sequencing data integration. It ranked top among other methods in benchmarking and is especially suitable for integrating datasets with batch and biological variances. With the single-cell resolution integration results, analysis of the interplay between genome and transcriptome can be done to help researchers understand their data and question.
单细胞组学技术正在迅速发展,以测量多种细胞类型的表观基因组、基因组和转录组。然而,整合来自不同模式的组学数据仍然具有挑战性。在这里,我们提出了一种称为 MinNet 的孪生神经网络框架的变体,该框架通过使用基于图的对比损失来训练单细胞分辨率下的多组学数据集成。
通过对几个基准数据集进行训练和测试,我们展示了该模型在整合 scRNA-seq 与 scATAC-seq 以及 scRNA-seq 与表位数据方面的准确性和通用性。进一步的评估表明,我们的模型具有独特的去除批次效应的能力,这是实际实践中的一个常见问题。为了展示整合对下游分析的影响,我们建立了基于模型的平滑和顺式调控元件推断方法,并通过外部 pcHi-C 证据进行了验证。最后,我们将该框架应用于 COVID-19 数据集,通过基于整合的分析来支持原始工作,展示了其在单细胞多组学研究中的必要性。
MinNet 是一种用于单细胞多组学测序数据整合的新型深度学习框架。它在基准测试中排名第一,特别适合整合具有批次和生物学差异的数据集。通过单细胞分辨率的整合结果,可以对基因组和转录组之间的相互作用进行分析,以帮助研究人员理解他们的数据和问题。