• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

C-ziptf:用于零膨胀多维基因组学数据的稳定张量分解。

C-ziptf: stable tensor factorization for zero-inflated multi-dimensional genomics data.

机构信息

Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.

Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.

出版信息

BMC Bioinformatics. 2024 Oct 5;25(1):323. doi: 10.1186/s12859-024-05886-4.

DOI:10.1186/s12859-024-05886-4
PMID:39369208
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11456250/
Abstract

In the past two decades, genomics has advanced significantly, with single-cell RNA-sequencing (scRNA-seq) marking a pivotal milestone. ScRNA-seq provides unparalleled insights into cellular diversity and has spurred diverse studies across multiple conditions and samples, resulting in an influx of complex multidimensional genomics data. This highlights the need for robust methodologies capable of handling the complexity and multidimensionality of such genomics data. Furthermore, single-cell data grapples with sparsity due to issues like low capture efficiency and dropout effects. Tensor factorizations (TF) have emerged as powerful tools to unravel the complex patterns from multi-dimensional genomics data. Classic TF methods, based on maximum likelihood estimation, struggle with zero-inflated count data, while the inherent stochasticity in TFs further complicates result interpretation and reproducibility. Our paper introduces Zero Inflated Poisson Tensor Factorization (ZIPTF), a novel method for high-dimensional zero-inflated count data factorization. We also present Consensus-ZIPTF (C-ZIPTF), merging ZIPTF with a consensus-based approach to address stochasticity. We evaluate our proposed methods on synthetic zero-inflated count data, simulated scRNA-seq data, and real multi-sample multi-condition scRNA-seq datasets. ZIPTF consistently outperforms baseline matrix and tensor factorization methods, displaying enhanced reconstruction accuracy for zero-inflated data. When dealing with high probabilities of excess zeros, ZIPTF achieves up to better accuracy. Moreover, C-ZIPTF notably enhances the factorization's consistency. When tested on synthetic and real scRNA-seq data, ZIPTF and C-ZIPTF consistently uncover known and biologically meaningful gene expression programs. Access our data and code at: https://github.com/klarman-cell-observatory/scBTF and https://github.com/klarman-cell-observatory/scbtf_experiments .

摘要

在过去的二十年中,基因组学取得了显著的进展,单细胞 RNA 测序(scRNA-seq)标志着一个关键的里程碑。scRNA-seq 提供了无与伦比的细胞多样性见解,并推动了多种条件和样本的研究,导致复杂的多维基因组学数据大量涌入。这凸显了需要稳健的方法来处理这种基因组学数据的复杂性和多维性。此外,由于捕获效率低和缺失效应等问题,单细胞数据存在稀疏性。张量分解(TF)已成为从多维基因组学数据中揭示复杂模式的强大工具。基于最大似然估计的经典 TF 方法在处理零膨胀计数数据方面存在困难,而 TF 中的固有随机性进一步增加了结果解释和可重复性的复杂性。我们的论文介绍了零膨胀泊松张量分解(ZIPTF),这是一种用于高维零膨胀计数数据分解的新方法。我们还提出了共识零膨胀泊松张量分解(C-ZIPTF),通过将 ZIPTF 与基于共识的方法相结合来解决随机性问题。我们在合成零膨胀计数数据、模拟 scRNA-seq 数据和真实多样本多条件 scRNA-seq 数据集上评估了我们提出的方法。ZIPTF 始终优于基线矩阵和张量分解方法,显示出对零膨胀数据的重建准确性更高。当处理过高的零过量概率时,ZIPTF 可以达到高达 的更高准确性。此外,C-ZIPTF 显著增强了分解的一致性。在对合成和真实 scRNA-seq 数据的测试中,ZIPTF 和 C-ZIPTF 一致地揭示了已知的和具有生物学意义的基因表达程序。访问我们的数据和代码:https://github.com/klarman-cell-observatory/scBTF 和 https://github.com/klarman-cell-observatory/scbtf_experiments 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0753/11456250/8cdfb851e02c/12859_2024_5886_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0753/11456250/544ba7dddf4e/12859_2024_5886_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0753/11456250/f82af0b703a9/12859_2024_5886_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0753/11456250/6f7ba16d6141/12859_2024_5886_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0753/11456250/e8de2692efa6/12859_2024_5886_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0753/11456250/96c0d0113023/12859_2024_5886_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0753/11456250/ebb33e2c9b2f/12859_2024_5886_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0753/11456250/b7dfb1820219/12859_2024_5886_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0753/11456250/48484ae14182/12859_2024_5886_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0753/11456250/31ed1ebd52e3/12859_2024_5886_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0753/11456250/8cdfb851e02c/12859_2024_5886_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0753/11456250/544ba7dddf4e/12859_2024_5886_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0753/11456250/f82af0b703a9/12859_2024_5886_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0753/11456250/6f7ba16d6141/12859_2024_5886_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0753/11456250/e8de2692efa6/12859_2024_5886_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0753/11456250/96c0d0113023/12859_2024_5886_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0753/11456250/ebb33e2c9b2f/12859_2024_5886_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0753/11456250/b7dfb1820219/12859_2024_5886_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0753/11456250/48484ae14182/12859_2024_5886_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0753/11456250/31ed1ebd52e3/12859_2024_5886_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0753/11456250/8cdfb851e02c/12859_2024_5886_Fig10_HTML.jpg

相似文献

1
C-ziptf: stable tensor factorization for zero-inflated multi-dimensional genomics data.C-ziptf:用于零膨胀多维基因组学数据的稳定张量分解。
BMC Bioinformatics. 2024 Oct 5;25(1):323. doi: 10.1186/s12859-024-05886-4.
2
GE-Impute: graph embedding-based imputation for single-cell RNA-seq data.GE-Impute:基于图嵌入的单细胞 RNA-seq 数据插补。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac313.
3
scBGEDA: deep single-cell clustering analysis via a dual denoising autoencoder with bipartite graph ensemble clustering.scBGEDA:基于双分图集成分聚类的对偶去噪自动编码器的单细胞聚类分析。
Bioinformatics. 2023 Feb 14;39(2). doi: 10.1093/bioinformatics/btad075.
4
scGCL: an imputation method for scRNA-seq data based on graph contrastive learning.scGCL:一种基于图对比学习的 scRNA-seq 数据插补方法。
Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad098.
5
SMURF: embedding single-cell RNA-seq data with matrix factorization preserving self-consistency.SMURF:通过保持自一致性的矩阵分解来嵌入单细胞RNA测序数据
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad026.
6
scRNMF: An imputation method for single-cell RNA-seq data by robust and non-negative matrix factorization.scRNMF:一种基于鲁棒非负矩阵分解的单细胞 RNA-seq 数据插补方法。
PLoS Comput Biol. 2024 Aug 8;20(8):e1012339. doi: 10.1371/journal.pcbi.1012339. eCollection 2024 Aug.
7
ScLRTC: imputation for single-cell RNA-seq data via low-rank tensor completion.ScLRTC:基于低秩张量补全的单细胞 RNA-seq 数据插补。
BMC Genomics. 2021 Nov 29;22(1):860. doi: 10.1186/s12864-021-08101-3.
8
scTPC: a novel semisupervised deep clustering model for scRNA-seq data.scTPC:一种用于 scRNA-seq 数据的新型半监督深度聚类模型。
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae293.
9
SwarnSeq: An improved statistical approach for differential expression analysis of single-cell RNA-seq data.SwarnSeq:一种用于单细胞 RNA-seq 数据差异表达分析的改进统计方法。
Genomics. 2021 May;113(3):1308-1324. doi: 10.1016/j.ygeno.2021.02.014. Epub 2021 Mar 1.
10
scDCCA: deep contrastive clustering for single-cell RNA-seq data based on auto-encoder network.scDCCA:基于自动编码器网络的单细胞RNA测序数据深度对比聚类
Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac625.

引用本文的文献

1
Genome-scale spatial mapping of the Hodgkin lymphoma microenvironment identifies tumor cell survival factors.霍奇金淋巴瘤微环境的全基因组空间图谱鉴定出肿瘤细胞存活因子。
bioRxiv. 2025 Jan 25:2025.01.24.631210. doi: 10.1101/2025.01.24.631210.

本文引用的文献

1
Accurate and sensitive mutational signature analysis with MuSiCal.使用 MuSiCal 进行准确且灵敏的突变特征分析。
Nat Genet. 2024 Mar;56(3):541-552. doi: 10.1038/s41588-024-01659-0. Epub 2024 Feb 15.
2
Atlas-scale single-cell multi-sample multi-condition data integration using scMerge2.使用 scMerge2 进行图谱尺度单细胞多样本多条件数据整合。
Nat Commun. 2023 Jul 17;14(1):4272. doi: 10.1038/s41467-023-39923-2.
3
An interpretable single-cell RNA sequencing data clustering method based on latent Dirichlet allocation.基于潜在狄利克雷分配的可解释单细胞 RNA 测序数据聚类方法。
Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad199.
4
scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics.scDesign3 生成用于多模态单细胞和空间基因组学的逼真的计算机模拟数据。
Nat Biotechnol. 2024 Feb;42(2):247-252. doi: 10.1038/s41587-023-01772-1. Epub 2023 May 11.
5
Biologically informed deep learning to query gene programs in single-cell atlases.基于生物学信息的深度学习方法,可用于在单细胞图谱中查询基因程序。
Nat Cell Biol. 2023 Feb;25(2):337-350. doi: 10.1038/s41556-022-01072-x. Epub 2023 Feb 2.
6
GSEApy: a comprehensive package for performing gene set enrichment analysis in Python.GSEApy:一个用于在 Python 中进行基因集富集分析的综合软件包。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac757.
7
Genenames.org: the HGNC resources in 2023.Genenames.org:2023 年的 HGNC 资源。
Nucleic Acids Res. 2023 Jan 6;51(D1):D1003-D1009. doi: 10.1093/nar/gkac888.
8
Cross-tissue immune cell analysis reveals tissue-specific features in humans.跨组织免疫细胞分析揭示人类组织特异性特征。
Science. 2022 May 13;376(6594):eabl5197. doi: 10.1126/science.abl5197.
9
Single-cell RNA-seq reveals cell type-specific molecular and genetic associations to lupus.单细胞 RNA 测序揭示了狼疮相关的细胞类型特异性分子和遗传关联。
Science. 2022 Apr 8;376(6589):eabf1970. doi: 10.1126/science.abf1970.
10
From bulk, single-cell to spatial RNA sequencing.从批量、单细胞到空间 RNA 测序。
Int J Oral Sci. 2021 Nov 15;13(1):36. doi: 10.1038/s41368-021-00146-0.