文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

PbImpute:单细胞RNA测序数据中的精确零判别与平衡插补

PbImpute: Precise Zero Discrimination and Balanced Imputation in Single-Cell RNA Sequencing Data.

作者信息

Zhang Yi, Wang Yin, Liu Xinyuan, Feng Xi

机构信息

School of Computer Science and Engineering, Guilin University of Technology, 12 Jiangan Road, Qixing District, Guilin 541004, China.

Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, 12 Jiangan Road, Qixing District, Guilin 541004, China.

出版信息

J Chem Inf Model. 2025 Mar 10;65(5):2670-2684. doi: 10.1021/acs.jcim.4c02125. Epub 2025 Feb 17.


DOI:10.1021/acs.jcim.4c02125
PMID:39957720
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11898086/
Abstract

Single-cell RNA sequencing (scRNA-seq) has emerged as a transformative technology for elucidating cellular heterogeneity at unprecedented resolution. However, technical limitations such as limited sequencing depth and mRNA capture efficiency often result in zero counts, commonly referred to as "dropout zeros" in scRNA-seq data. These zeros pose significant challenges to downstream analysis, as they can distort the interpretation of cellular transcriptomes. While numerous computational methods have been developed to address this challenge, existing approaches frequently suffer from either insufficient imputation of zeros (under-imputation) or excessive modification of zeros (over-imputation). Here, we propose a precisely balanced imputation (PbImpute) method designed to achieve optimal equilibrium between dropout recovery and biological zero preservation in scRNA-seq data. PbImpute employs a multistage approach: (1) Initial discrimination between technical dropouts and biological zeros through parameter optimization of a new zero-inflated negative binomial (ZINB) distribution model, followed by initial imputation; (2) Application of a uniquely designed static repair algorithm to enhance data fidelity; (3) Secondary dropout identification based on gene expression frequency and partition-specific coefficient of variation; (4) Graph-embedding neural network-based imputation; and (5) Implementation of a uniquely designed dynamic repair mechanism to mitigate over-imputation effects. PbImpute distinguishes itself by uniquely integrating ZINB modeling with static and dynamic repair. This advantageous combined approach achieves a balance between over- and under-imputation, while simultaneously preserving true biological zeros and reducing signal distortion. Comprehensive evaluation using both simulated and real scRNA-seq data sets demonstrated that PbImpute achieves superior performance (F1 Score = 0.88 at 83% dropout rate, ARI = 0.78 on PBMC) in discriminating between technical dropouts and biological zeros compared to state-of-the-art methods. The method significantly improves gene-gene and cell-cell correlation structures, enhances differential expression analysis sensitivity, optimizes clustering resolution and dimensional reduction visualization, and facilitates more accurate trajectory inference. Ablation studies confirmed the essential contribution of both the imputation and repair modules to the method's performance. The code is available at https://github.com/WyBioTeam/PbImpute. By enhancing the accuracy of scRNA-seq data imputation, PbImpute can improve the identification of cell subpopulations and the detection of differentially expressed genes, thereby facilitating more precise analyses of cellular heterogeneity and advancing disease research.

摘要

单细胞RNA测序(scRNA-seq)已成为一项变革性技术,能够以前所未有的分辨率阐明细胞异质性。然而,诸如测序深度有限和mRNA捕获效率等技术限制常常导致计数为零,在scRNA-seq数据中通常称为“缺失零值”。这些零值给下游分析带来了重大挑战,因为它们会扭曲对细胞转录组的解读。虽然已经开发了许多计算方法来应对这一挑战,但现有方法常常要么对零值的插补不足(插补不足),要么对零值的修改过度(插补过度)。在此,我们提出一种精确平衡插补(PbImpute)方法,旨在在scRNA-seq数据的缺失恢复和生物学零值保留之间实现最佳平衡。PbImpute采用多阶段方法:(1)通过对新的零膨胀负二项式(ZINB)分布模型进行参数优化,初步区分技术缺失值和生物学零值,随后进行初步插补;(2)应用独特设计的静态修复算法以提高数据保真度;(3)基于基因表达频率和分区特异性变异系数进行二次缺失值识别;(4)基于图嵌入神经网络的插补;以及(5)实施独特设计的动态修复机制以减轻插补过度的影响。PbImpute的独特之处在于将ZINB建模与静态和动态修复独特地整合在一起。这种有利的组合方法在插补过度和不足之间实现了平衡,同时保留了真正的生物学零值并减少了信号失真。使用模拟和真实scRNA-seq数据集进行的综合评估表明,与现有最先进方法相比,PbImpute在区分技术缺失值和生物学零值方面具有卓越性能(在83%缺失率下F1分数 = 0.88,在PBMC上ARI = 0.78)。该方法显著改善了基因 - 基因和细胞 - 细胞的相关结构,提高了差异表达分析的灵敏度,优化了聚类分辨率和降维可视化,并促进了更准确的轨迹推断。消融研究证实了插补和修复模块对该方法性能的重要贡献。代码可在https://github.com/WyBioTeam/PbImpute获取。通过提高scRNA-seq数据插补的准确性,PbImpute可以改善细胞亚群的识别和差异表达基因的检测,从而促进对细胞异质性的更精确分析并推动疾病研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6486/11898086/314964028f42/ci4c02125_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6486/11898086/e8d5e80bbd4a/ci4c02125_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6486/11898086/04e8be6a5de5/ci4c02125_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6486/11898086/4245f382b3b4/ci4c02125_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6486/11898086/550b326899f3/ci4c02125_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6486/11898086/ddde48a38538/ci4c02125_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6486/11898086/314964028f42/ci4c02125_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6486/11898086/e8d5e80bbd4a/ci4c02125_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6486/11898086/04e8be6a5de5/ci4c02125_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6486/11898086/4245f382b3b4/ci4c02125_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6486/11898086/550b326899f3/ci4c02125_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6486/11898086/ddde48a38538/ci4c02125_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6486/11898086/314964028f42/ci4c02125_0006.jpg

相似文献

[1]
PbImpute: Precise Zero Discrimination and Balanced Imputation in Single-Cell RNA Sequencing Data.

J Chem Inf Model. 2025-3-10

[2]
GE-Impute: graph embedding-based imputation for single-cell RNA-seq data.

Brief Bioinform. 2022-9-20

[3]
CPARI: a novel approach combining cell partitioning with absolute and relative imputation to address dropout in single-cell RNA-seq data.

Brief Bioinform. 2024-11-22

[4]
scRecover: Discriminating True and False Zeros in Single-Cell RNA-Seq Data for Imputation.

Stat Med. 2025-2-28

[5]
CDSImpute: An ensemble similarity imputation method for single-cell RNA sequence dropouts.

Comput Biol Med. 2022-7

[6]
scGCL: an imputation method for scRNA-seq data based on graph contrastive learning.

Bioinformatics. 2023-3-1

[7]
TsImpute: an accurate two-step imputation method for single-cell RNA-seq data.

Bioinformatics. 2023-12-1

[8]
Bubble: a fast single-cell RNA-seq imputation using an autoencoder constrained by bulk RNA-seq data.

Brief Bioinform. 2023-1-19

[9]
GraCEImpute: A novel graph clustering autoencoder approach for imputation of single-cell RNA-seq data.

Comput Biol Med. 2025-1

[10]
SinCWIm: An imputation method for single-cell RNA sequence dropouts using weighted alternating least squares.

Comput Biol Med. 2024-3

本文引用的文献

[1]
CPARI: a novel approach combining cell partitioning with absolute and relative imputation to address dropout in single-cell RNA-seq data.

Brief Bioinform. 2024-11-22

[2]
Graph contrastive learning as a versatile foundation for advanced scRNA-seq data analysis.

Brief Bioinform. 2024-9-23

[3]
TsImpute: an accurate two-step imputation method for single-cell RNA-seq data.

Bioinformatics. 2023-12-1

[4]
CL-Impute: A contrastive learning-based imputation for dropout single-cell RNA-seq data.

Comput Biol Med. 2023-9

[5]
PhyloVelo enhances transcriptomic velocity field mapping using monotonically expressed genes.

Nat Biotechnol. 2024-5

[6]
Machine learning on protein-protein interaction prediction: models, challenges and trends.

Brief Bioinform. 2023-3-19

[7]
Modality-DTA: Multimodality Fusion Strategy for Drug-Target Affinity Prediction.

IEEE/ACM Trans Comput Biol Bioinform. 2023

[8]
GE-Impute: graph embedding-based imputation for single-cell RNA-seq data.

Brief Bioinform. 2022-9-20

[9]
Imputing dropouts for single-cell RNA sequencing based on multi-objective optimization.

Bioinformatics. 2022-6-13

[10]
Learning spatial structures of proteins improves protein-protein interaction prediction.

Brief Bioinform. 2022-3-10

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索