• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

倾向得分匹配可实现单细胞 RNA-seq 分析中的批次效应校正填补。

Propensity score matching enables batch-effect-corrected imputation in single-cell RNA-seq analysis.

机构信息

School of Statistics and Mathematics, Central University of Finance and Economics, Beijing, 100081,  China.

Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, 100872,  China.

出版信息

Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac275.

DOI:10.1093/bib/bbac275
PMID:35821114
Abstract

Developments of single-cell RNA sequencing (scRNA-seq) technologies have enabled biological discoveries at the single-cell resolution with high throughput. However, large scRNA-seq datasets always suffer from massive technical noises, including batch effects and dropouts, and the dropout is often shown to be batch-dependent. Most existing methods only address one of the problems, and we show that the popularly used methods failed in trading off batch effect correction and dropout imputation. Here, inspired by the idea of causal inference, we propose a novel propensity score matching method for scRNA-seq data (scPSM) by borrowing information and taking the weighted average from similar cells in the deep sequenced batch, which simultaneously removes the batch effect, imputes dropout and denoises data in the entire gene expression space. The proposed method is testified on two simulation datasets and a variety of real scRNA-seq datasets, and the results show that scPSM is superior to other state-of-the-art methods. First, scPSM improves clustering accuracy and mixes cells of the same type, suggesting its ability to keep cell type separation while correcting for batch. Besides, using the scPSM-integrated data as input yields results free of batch effects or dropouts in the differential expression analysis. Moreover, scPSM not only achieves ideal denoising but also preserves real biological structure for downstream gene-based analyses. Furthermore, scPSM is robust to hyperparameters and small datasets with a few cells but enormous genes. Comprehensive evaluations demonstrate that scPSM jointly provides desirable batch effect correction, imputation and denoising for recovering the biologically meaningful expression in scRNA-seq data.

摘要

单细胞 RNA 测序 (scRNA-seq) 技术的发展使得能够以高通量在单细胞分辨率下进行生物学发现。然而,大型 scRNA-seq 数据集通常会受到大量技术噪声的影响,包括批次效应和丢包,并且丢包通常表现为批次依赖性。大多数现有方法仅解决了其中一个问题,我们表明,常用的方法在权衡批次效应校正和丢包插补方面失败了。在这里,受因果推理思想的启发,我们通过借鉴信息并从深度测序批次中的相似细胞中取加权平均值,提出了一种用于 scRNA-seq 数据的新倾向得分匹配方法 (scPSM),该方法可以同时去除批次效应、插补丢包并在整个基因表达空间中对数据进行去噪。该方法在两个模拟数据集和多种真实 scRNA-seq 数据集上进行了验证,结果表明 scPSM 优于其他最先进的方法。首先,scPSM 提高了聚类准确性并混合了相同类型的细胞,这表明其在纠正批次效应的同时保持细胞类型分离的能力。此外,使用 scPSM 集成数据作为输入,在差异表达分析中可以获得无批次效应或丢包的结果。此外,scPSM 不仅实现了理想的去噪,而且还保留了下游基于基因的分析中的真实生物学结构。此外,scPSM 对超参数和小数据集(细胞数量少但基因数量巨大)具有鲁棒性。综合评估表明,scPSM 联合提供了理想的批量效应校正、插补和去噪,以恢复 scRNA-seq 数据中具有生物学意义的表达。

相似文献

1
Propensity score matching enables batch-effect-corrected imputation in single-cell RNA-seq analysis.倾向得分匹配可实现单细胞 RNA-seq 分析中的批次效应校正填补。
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac275.
2
A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics.联合深度学习模型可实现单细胞转录组学中批量效应校正、去噪和聚类的同时进行。
Genome Res. 2021 Oct;31(10):1753-1766. doi: 10.1101/gr.271874.120. Epub 2021 May 25.
3
CDSImpute: An ensemble similarity imputation method for single-cell RNA sequence dropouts.CDSImpute:一种用于单细胞 RNA 序列缺失的集成相似性插补方法。
Comput Biol Med. 2022 Jul;146:105658. doi: 10.1016/j.compbiomed.2022.105658. Epub 2022 May 21.
4
CL-Impute: A contrastive learning-based imputation for dropout single-cell RNA-seq data.CL-Impute:基于对比学习的 dropout 单细胞 RNA-seq 数据插补方法。
Comput Biol Med. 2023 Sep;164:107263. doi: 10.1016/j.compbiomed.2023.107263. Epub 2023 Jul 23.
5
Accurate and interpretable gene expression imputation on scRNA-seq data using IGSimpute.使用 IGSimpute 实现 scRNA-seq 数据的准确和可解释的基因表达推断。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad124.
6
SinCWIm: An imputation method for single-cell RNA sequence dropouts using weighted alternating least squares.SinCWIm:一种基于加权交替最小二乘法的单细胞 RNA 序列缺失数据插补方法。
Comput Biol Med. 2024 Mar;171:108225. doi: 10.1016/j.compbiomed.2024.108225. Epub 2024 Feb 27.
7
GE-Impute: graph embedding-based imputation for single-cell RNA-seq data.GE-Impute:基于图嵌入的单细胞 RNA-seq 数据插补。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac313.
8
scIGANs: single-cell RNA-seq imputation using generative adversarial networks.scIGANs:基于生成对抗网络的单细胞 RNA-seq 插补。
Nucleic Acids Res. 2020 Sep 4;48(15):e85. doi: 10.1093/nar/gkaa506.
9
AutoImpute: Autoencoder based imputation of single-cell RNA-seq data.AutoImpute:基于自动编码器的单细胞 RNA-seq 数据插补。
Sci Rep. 2018 Nov 5;8(1):16329. doi: 10.1038/s41598-018-34688-x.
10
AGImpute: imputation of scRNA-seq data based on a hybrid GAN with dropouts identification.AGImpute:基于带有缺失值识别的混合生成对抗网络对单细胞RNA测序数据进行插补
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae068.

引用本文的文献

1
Assessing the impact of batch effect associated missing values on downstream analysis in high-throughput biomedical data.评估高通量生物医学数据中与批次效应相关的缺失值对下游分析的影响。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf168.
2
Thinking points for effective batch correction on biomedical data.生物医学数据有效批量校正的思考要点。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae515.
3
[Imputation method for dropout in single-cell transcriptome data].[单细胞转录组数据中缺失值的插补方法]
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2023 Aug 25;40(4):778-783. doi: 10.7507/1001-5515.202301009.
4
Batch alignment of single-cell transcriptomics data using deep metric learning.基于深度度量学习的单细胞转录组学数据批量对齐。
Nat Commun. 2023 Feb 21;14(1):960. doi: 10.1038/s41467-023-36635-5.
5
Leveraging data-driven self-consistency for high-fidelity gene expression recovery.利用数据驱动的自一致性进行高保真基因表达恢复。
Nat Commun. 2022 Nov 21;13(1):7142. doi: 10.1038/s41467-022-34595-w.