Suppr超能文献

用于单细胞多组学整合的无监督神经网络(UMINT):在健康与疾病中的应用

Unsupervised neural network for single cell Multi-omics INTegration (UMINT): an application to health and disease.

作者信息

Maitra Chayan, Seal Dibyendu B, Das Vivek, De Rajat K

机构信息

Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India.

Tatras Data Services Pvt. Ltd., New Delhi, India.

出版信息

Front Mol Biosci. 2023 May 24;10:1184748. doi: 10.3389/fmolb.2023.1184748. eCollection 2023.

Abstract

Multi-omics studies have enabled us to understand the mechanistic drivers behind complex disease states and progressions, thereby providing novel and actionable biological insights into health status. However, integrating data from multiple modalities is challenging due to high dimensionality and diverse nature of data, and noise associated with each platform. Sparsity in data, non-overlapping features and technical batch effects make the task of learning more complicated. Conventional machine learning (ML) tools are not quite effective against such data integration hazards due to their simplistic nature with less capacity. In addition, existing methods for single cell multi-omics integration are computationally expensive. Therefore, in this work, we have introduced a novel Unsupervised neural network for single cell Multi-omics INTegration (UMINT). UMINT serves as a promising model for integrating variable number of single cell omics layers with high dimensions. It has a light-weight architecture with substantially reduced number of parameters. The proposed model is capable of learning a latent low-dimensional embedding that can extract useful features from the data facilitating further downstream analyses. UMINT has been applied to integrate healthy and disease CITE-seq (paired RNA and surface proteins) datasets including a rare disease Mucosa-Associated Lymphoid Tissue (MALT) tumor. It has been benchmarked against existing state-of-the-art methods for single cell multi-omics integration. Furthermore, UMINT is capable of integrating paired single cell gene expression and ATAC-seq (Transposase-Accessible Chromatin) assays as well.

摘要

多组学研究使我们能够理解复杂疾病状态和进展背后的机制驱动因素,从而为健康状况提供新的、可付诸行动的生物学见解。然而,由于数据的高维度、多样性质以及与每个平台相关的噪声,整合来自多种模态的数据具有挑战性。数据的稀疏性、不重叠特征和技术批次效应使得学习任务更加复杂。传统的机器学习(ML)工具由于其简单的性质和较小的容量,对这种数据整合风险不太有效。此外,现有的单细胞多组学整合方法计算成本高昂。因此,在这项工作中,我们引入了一种用于单细胞多组学整合(UMINT)的新型无监督神经网络。UMINT作为一种有前途的模型,用于整合可变数量的高维单细胞组学层。它具有轻量级架构,参数数量大幅减少。所提出的模型能够学习潜在的低维嵌入,从数据中提取有用特征,便于进一步的下游分析。UMINT已应用于整合健康和疾病的CITE-seq(配对的RNA和表面蛋白)数据集,包括一种罕见疾病黏膜相关淋巴组织(MALT)肿瘤。它已与现有的单细胞多组学整合的先进方法进行了基准测试。此外,UMINT还能够整合配对的单细胞基因表达和ATAC-seq(转座酶可及染色质)分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88ad/10244650/a3e781bc74d4/fmolb-10-1184748-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验