• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

HDMC:一种用于去除单细胞 RNA-seq 数据中批次效应的新型深度学习框架。

HDMC: a novel deep learning-based framework for removing batch effects in single-cell RNA-seq data.

机构信息

College of Computer Science, Nankai University, 300350 Tianjin, China.

Tianjin Key Laboratory of Network and Data Security Technology, Nankai University, 300350 Tianjin, China.

出版信息

Bioinformatics. 2022 Feb 7;38(5):1295-1303. doi: 10.1093/bioinformatics/btab821.

DOI:10.1093/bioinformatics/btab821
PMID:34864918
Abstract

MOTIVATION

With the development of single-cell RNA sequencing (scRNA-seq) techniques, increasingly more large-scale gene expression datasets become available. However, to analyze datasets produced by different experiments, batch effects among different datasets must be considered. Although several methods have been recently published to remove batch effects in scRNA-seq data, two problems remain to be challenging and not completely solved: (i) how to reduce the distribution differences of different batches more accurately; and (ii) how to align samples from different batches to recover the cell type clusters.

RESULTS

We proposed a novel deep-learning approach, which is a hierarchical distribution-matching framework assisted with contrastive learning to address these two problems. Firstly, we design a hierarchical framework for distribution matching based on a deep autoencoder. This framework employs an adversarial training strategy to match the global distribution of different batches. This provides an improved foundation to further match the local distributions with a maximum mean discrepancy-based loss. For local matching, we divide cells in each batch into clusters and develop a contrastive learning mechanism to simultaneously align similar cluster pairs and keep noisy pairs apart from each other. This allows to obtain clusters with all cells of the same type (true positives), and avoid clusters with cells of different type (false positives). We demonstrate the effectiveness of our method on both simulated and real datasets. Results show that our new method significantly outperforms the state-of-the-art methods and has the ability to prevent overcorrection.

AVAILABILITY AND IMPLEMENTATION

The python code to generate results and figures in this article is available at https://github.com/zhanglabNKU/HDMC, the data underlying this article is also available at this github repository.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

随着单细胞 RNA 测序 (scRNA-seq) 技术的发展,越来越多的大规模基因表达数据集变得可用。然而,为了分析来自不同实验的数据集,必须考虑不同数据集之间的批次效应。尽管最近已经发表了几种方法来去除 scRNA-seq 数据中的批次效应,但仍有两个问题具有挑战性且尚未完全解决:(i) 如何更准确地减少不同批次的分布差异;(ii) 如何对齐来自不同批次的样本以恢复细胞类型聚类。

结果

我们提出了一种新的深度学习方法,这是一个层次分布匹配框架,辅助对比学习来解决这两个问题。首先,我们设计了一个基于深度自动编码器的层次分布匹配框架。该框架采用对抗训练策略来匹配不同批次的全局分布。这为进一步使用基于最大均值差异的损失来匹配局部分布提供了改进的基础。对于局部匹配,我们将每个批次中的细胞划分为聚类,并开发了一种对比学习机制,以同时对齐相似的聚类对,并使噪声对彼此分开。这允许获得具有相同类型的所有细胞的聚类(真阳性),并避免具有不同类型的细胞的聚类(假阳性)。我们在模拟和真实数据集上证明了我们方法的有效性。结果表明,我们的新方法显著优于最先进的方法,并且具有防止过度校正的能力。

可用性和实现

本文生成结果和图的 python 代码可在 https://github.com/zhanglabNKU/HDMC 上获得,本文所依据的数据也可在这个 github 存储库中获得。

补充信息

补充数据可在生物信息学在线获得。

相似文献

1
HDMC: a novel deep learning-based framework for removing batch effects in single-cell RNA-seq data.HDMC:一种用于去除单细胞 RNA-seq 数据中批次效应的新型深度学习框架。
Bioinformatics. 2022 Feb 7;38(5):1295-1303. doi: 10.1093/bioinformatics/btab821.
2
BERMAD: batch effect removal for single-cell RNA-seq data using a multi-layer adaptation autoencoder with dual-channel framework.BERMAD:基于双通道框架的多层自适应自动编码器去除单细胞 RNA-seq 数据中的批次效应
Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae127.
3
ResPAN: a powerful batch correction model for scRNA-seq data through residual adversarial networks.ResPAN:通过残差对抗网络对 scRNA-seq 数据进行强大的批量校正模型。
Bioinformatics. 2022 Aug 10;38(16):3942-3949. doi: 10.1093/bioinformatics/btac427.
4
CLAIRE: contrastive learning-based batch correction framework for better balance between batch mixing and preservation of cellular heterogeneity.CLAIRE:基于对比学习的批次校正框架,更好地平衡批次混合和保留细胞异质性。
Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad099.
5
Deep enhanced constraint clustering based on contrastive learning for scRNA-seq data.基于对比学习的深度增强约束聚类算法在单细胞 RNA-seq 数据分析中的应用。
Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad222.
6
scGCL: an imputation method for scRNA-seq data based on graph contrastive learning.scGCL:一种基于图对比学习的 scRNA-seq 数据插补方法。
Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad098.
7
scBGEDA: deep single-cell clustering analysis via a dual denoising autoencoder with bipartite graph ensemble clustering.scBGEDA:基于双分图集成分聚类的对偶去噪自动编码器的单细胞聚类分析。
Bioinformatics. 2023 Feb 14;39(2). doi: 10.1093/bioinformatics/btad075.
8
scNAME: neighborhood contrastive clustering with ancillary mask estimation for scRNA-seq data.scNAME:基于辅助掩模估计的 scRNA-seq 数据邻域对比聚类。
Bioinformatics. 2022 Mar 4;38(6):1575-1583. doi: 10.1093/bioinformatics/btac011.
9
BERMUDA: a novel deep transfer learning method for single-cell RNA sequencing batch correction reveals hidden high-resolution cellular subtypes.百慕大:一种新型的单细胞 RNA 测序批次校正深度迁移学习方法揭示了隐藏的高分辨率细胞亚型。
Genome Biol. 2019 Aug 12;20(1):165. doi: 10.1186/s13059-019-1764-6.
10
scMRA: a robust deep learning method to annotate scRNA-seq data with multiple reference datasets.scMRA:一种用于用多个参考数据集注释单细胞RNA测序数据的强大深度学习方法。
Bioinformatics. 2022 Jan 12;38(3):738-745. doi: 10.1093/bioinformatics/btab700.

引用本文的文献

1
Less is more: improving cell-type identification with augmentation-free single-cell RNA-Seq contrastive learning.少即是多:通过无增强单细胞RNA测序对比学习改进细胞类型识别
Bioinformatics. 2025 Sep 1;41(9). doi: 10.1093/bioinformatics/btaf437.
2
scTrans: Sparse attention powers fast and accurate cell type annotation in single-cell RNA-seq data.scTrans:稀疏注意力助力单细胞RNA测序数据中快速且准确的细胞类型注释。
PLoS Comput Biol. 2025 Apr 4;21(4):e1012904. doi: 10.1371/journal.pcbi.1012904. eCollection 2025 Apr.
3
BERMAD: batch effect removal for single-cell RNA-seq data using a multi-layer adaptation autoencoder with dual-channel framework.
BERMAD:基于双通道框架的多层自适应自动编码器去除单细胞 RNA-seq 数据中的批次效应
Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae127.
4
MASI enables fast model-free standardization and integration of single-cell transcriptomics data.MASI 能够快速进行无模型标准化和单细胞转录组学数据的整合。
Commun Biol. 2023 Apr 28;6(1):465. doi: 10.1038/s42003-023-04820-3.
5
Fast model-free standardization and integration of single-cell transcriptomics data.单细胞转录组学数据的快速无模型标准化与整合
Res Sq. 2023 Jan 23:rs.3.rs-2485985. doi: 10.21203/rs.3.rs-2485985/v1.
6
Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review.深度学习在单细胞 RNA 测序数据分析中的应用:综述。
Genomics Proteomics Bioinformatics. 2022 Oct;20(5):814-835. doi: 10.1016/j.gpb.2022.11.011. Epub 2022 Dec 14.
7
Self-supervised contrastive learning for integrative single cell RNA-seq data analysis.基于自监督对比学习的整合单细胞 RNA-seq 数据分析。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac377.
8
A Contrastive Learning Pre-Training Method for Motif Occupancy Identification. motif 占有率识别的对比学习预训练方法。
Int J Mol Sci. 2022 Apr 24;23(9):4699. doi: 10.3390/ijms23094699.