• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

稀疏对比主成分分析在高维生物学数据中的应用。

Exploring high-dimensional biological data with sparse contrastive principal component analysis.

机构信息

Graduate Group in Biostatistics.

Center for Computational Biology.

出版信息

Bioinformatics. 2020 Jun 1;36(11):3422-3430. doi: 10.1093/bioinformatics/btaa176.

DOI:10.1093/bioinformatics/btaa176
PMID:32176249
Abstract

MOTIVATION

Statistical analyses of high-throughput sequencing data have re-shaped the biological sciences. In spite of myriad advances, recovering interpretable biological signal from data corrupted by technical noise remains a prevalent open problem. Several classes of procedures, among them classical dimensionality reduction techniques and others incorporating subject-matter knowledge, have provided effective advances. However, no procedure currently satisfies the dual objectives of recovering stable and relevant features simultaneously.

RESULTS

Inspired by recent proposals for making use of control data in the removal of unwanted variation, we propose a variant of principal component analysis (PCA), sparse contrastive PCA that extracts sparse, stable, interpretable and relevant biological signal. The new methodology is compared to competing dimensionality reduction approaches through a simulation study and via analyses of several publicly available protein expression, microarray gene expression and single-cell transcriptome sequencing datasets.

AVAILABILITY AND IMPLEMENTATION

A free and open-source software implementation of the methodology, the scPCA R package, is made available via the Bioconductor Project. Code for all analyses presented in this article is also available via GitHub.

CONTACT

philippe_boileau@berkeley.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

高通量测序数据的统计分析已经改变了生物科学。尽管取得了无数的进展,但从被技术噪声污染的数据中恢复可解释的生物学信号仍然是一个普遍存在的开放性问题。几类程序,包括经典的降维技术和其他结合主题知识的程序,都提供了有效的进展。然而,目前没有一种程序能够同时满足恢复稳定和相关特征的双重目标。

结果

受最近提出的利用对照数据去除不需要的变化的启发,我们提出了一种主成分分析(PCA)的变体,稀疏对比 PCA,它可以提取稀疏、稳定、可解释和相关的生物学信号。通过模拟研究和对几个公开可用的蛋白质表达、微阵列基因表达和单细胞转录组测序数据集的分析,将新方法与竞争的降维方法进行了比较。

可用性和实现

该方法的免费和开源软件实现,scPCA R 包,可通过 Bioconductor 项目获得。本文中所有分析的代码也可通过 GitHub 获得。

联系信息

philippe_boileau@berkeley.edu。

补充信息

补充数据可在生物信息学在线获得。

相似文献

1
Exploring high-dimensional biological data with sparse contrastive principal component analysis.稀疏对比主成分分析在高维生物学数据中的应用。
Bioinformatics. 2020 Jun 1;36(11):3422-3430. doi: 10.1093/bioinformatics/btaa176.
2
Meta-analytic principal component analysis in integrative omics application.整合组学应用中的元分析主成分分析。
Bioinformatics. 2018 Apr 15;34(8):1321-1328. doi: 10.1093/bioinformatics/btx765.
3
Applying stability selection to consistently estimate sparse principal components in high-dimensional molecular data.应用稳定性选择方法在高维分子数据中一致估计稀疏主成分。
Bioinformatics. 2015 Aug 15;31(16):2683-90. doi: 10.1093/bioinformatics/btv197. Epub 2015 Apr 10.
4
Edge-group sparse PCA for network-guided high dimensional data analysis.基于边缘群稀疏 PCA 的网络引导高维数据分析。
Bioinformatics. 2018 Oct 15;34(20):3479-3487. doi: 10.1093/bioinformatics/bty362.
5
projectR: an R/Bioconductor package for transfer learning via PCA, NMF, correlation and clustering.projectR:一个用于通过 PCA、NMF、相关性和聚类进行迁移学习的 R/Bioconductor 包。
Bioinformatics. 2020 Jun 1;36(11):3592-3593. doi: 10.1093/bioinformatics/btaa183.
6
Simultaneous dimension reduction and adjustment for confounding variation.同时进行降维和混杂变异调整。
Proc Natl Acad Sci U S A. 2016 Dec 20;113(51):14662-14667. doi: 10.1073/pnas.1617317113. Epub 2016 Dec 7.
7
Learning sparse log-ratios for high-throughput sequencing data.学习高通量测序数据的稀疏对数比。
Bioinformatics. 2021 Dec 22;38(1):157-163. doi: 10.1093/bioinformatics/btab645.
8
SCell: integrated analysis of single-cell RNA-seq data.SCell:单细胞RNA测序数据的综合分析
Bioinformatics. 2016 Jul 15;32(14):2219-20. doi: 10.1093/bioinformatics/btw201. Epub 2016 Apr 19.
9
fastNGSadmix: admixture proportions and principal component analysis of a single NGS sample.fastNGSadmix:单个 NGS 样本的混合比例和主成分分析。
Bioinformatics. 2017 Oct 1;33(19):3148-3150. doi: 10.1093/bioinformatics/btx474.
10
NetSHy: network summarization via a hybrid approach leveraging topological properties.NetSHy:利用拓扑属性的混合方法进行网络概括。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac818.

引用本文的文献

1
PhytoCluster: a generative deep learning model for clustering plant single-cell RNA-seq data.植物聚类:一种用于对植物单细胞RNA测序数据进行聚类的生成式深度学习模型。
aBIOTECH. 2025 Feb 20;6(2):189-201. doi: 10.1007/s42994-025-00196-6. eCollection 2025 Jun.
2
Deep learning in single-cell and spatial transcriptomics data analysis: advances and challenges from a data science perspective.从数据科学视角看深度学习在单细胞和空间转录组学数据分析中的进展与挑战
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf136.
3
Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation.
单细胞RNA测序数据分析中Seurat函数参数值:生物学解释的潜在陷阱与改进
Front Bioinform. 2025 Feb 12;5:1519468. doi: 10.3389/fbinf.2025.1519468. eCollection 2025.
4
Identifying patterns differing between high-dimensional datasets with generalized contrastive PCA.使用广义对比主成分分析识别高维数据集之间的差异模式。
PLoS Comput Biol. 2025 Feb 7;21(2):e1012747. doi: 10.1371/journal.pcbi.1012747. eCollection 2025 Feb.
5
The discrete empirical interpolation method in class identification and data summarization.类别识别与数据汇总中的离散经验插值方法
Wiley Interdiscip Rev Comput Stat. 2024 May-Jun;16(3). doi: 10.1002/wics.1653. Epub 2024 May 5.
6
Identifying patterns differing between high-dimensional datasets with generalized contrastive PCA.使用广义对比主成分分析识别高维数据集之间的差异模式。
bioRxiv. 2024 Aug 9:2024.08.08.607264. doi: 10.1101/2024.08.08.607264.
7
Single-cell omics: experimental workflow, data analyses and applications.单细胞组学:实验工作流程、数据分析及应用
Sci China Life Sci. 2025 Jan;68(1):5-102. doi: 10.1007/s11427-023-2561-0. Epub 2024 Jul 23.
8
Network Comparison with Interpretable Contrastive Network Representation Learning.基于可解释对比网络表示学习的网络比较
J Data Sci Stat Vis. 2022 Sep 7;2(5). doi: 10.52933/jdssv.v2i5.56.
9
Contrastive multiple correspondence analysis (cMCA): Using contrastive learning to identify latent subgroups in political parties.对比多项对应分析(cMCA):利用对比学习在政党中识别潜在的亚群体。
PLoS One. 2023 Jul 10;18(7):e0287180. doi: 10.1371/journal.pone.0287180. eCollection 2023.
10
An accessible infrastructure for artificial intelligence using a Docker-based JupyterLab in Galaxy.利用 Galaxy 中基于 Docker 的 JupyterLab 实现人工智能的可访问基础设施。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad028. Epub 2023 Apr 26.