• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用广义对比主成分分析识别高维数据集之间的差异模式。

Identifying patterns differing between high-dimensional datasets with generalized contrastive PCA.

作者信息

de Oliveira Eliezyer Fermino, Garg Pranjal, Hjerling-Leffler Jens, Batista-Brito Renata, Sjulson Lucas

机构信息

Dominick P. Purpura Department of Neuroscience, Albert Einstein College of Medicine, Bronx, NY.

All India Institute of Medical Sciences, Rishikesh, India.

出版信息

bioRxiv. 2024 Aug 9:2024.08.08.607264. doi: 10.1101/2024.08.08.607264.

DOI:10.1101/2024.08.08.607264
PMID:39149388
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11326262/
Abstract

High-dimensional data have become ubiquitous in the biological sciences, and it is often desirable to compare two datasets collected under different experimental conditions to extract low-dimensional patterns enriched in one condition. However, traditional dimensionality reduction techniques cannot accomplish this because they operate on only one dataset. Contrastive principal component analysis (cPCA) has been proposed to address this problem, but it has seen little adoption because it requires tuning a hyperparameter resulting in multiple solutions, with no way of knowing which is correct. Moreover, cPCA uses foreground and background conditions that are treated differently, making it ill-suited to compare two experimental conditions symmetrically. Here we describe the development of generalized contrastive PCA (gcPCA), a flexible hyperparameter-free approach that solves these problems. We first provide analyses explaining why cPCA requires a hyperparameter and how gcPCA avoids this requirement. We then describe an open-source gcPCA toolbox containing Python and MATLAB implementations of several variants of gcPCA tailored for different scenarios. Finally, we demonstrate the utility of gcPCA in analyzing diverse high-dimensional biological data, revealing unsupervised detection of hippocampal replay in neurophysiological recordings and heterogeneity of type II diabetes in single-cell RNA sequencing data. As a fast, robust, and easy-to-use comparison method, gcPCA provides a valuable resource facilitating the analysis of diverse high-dimensional datasets to gain new insights into complex biological phenomena.

摘要

高维数据在生物科学中已无处不在,通常希望比较在不同实验条件下收集的两个数据集,以提取在一种条件下富集的低维模式。然而,传统的降维技术无法做到这一点,因为它们仅对一个数据集进行操作。对比主成分分析(cPCA)已被提出用于解决此问题,但由于它需要调整一个超参数,从而产生多个解决方案,且无法知道哪个是正确的,因此很少被采用。此外,cPCA使用的前景和背景条件处理方式不同,使其不适用于对称地比较两个实验条件。在这里,我们描述了广义对比主成分分析(gcPCA)的发展,这是一种灵活的无超参数方法,可以解决这些问题。我们首先进行分析,解释为什么cPCA需要一个超参数以及gcPCA如何避免这种需求。然后,我们描述了一个开源的gcPCA工具箱,其中包含针对不同场景定制的几种gcPCA变体的Python和MATLAB实现。最后,我们展示了gcPCA在分析各种高维生物数据中的效用,揭示了在神经生理学记录中对海马重放的无监督检测以及单细胞RNA测序数据中II型糖尿病的异质性。作为一种快速、稳健且易于使用的比较方法,gcPCA提供了一种宝贵的资源,有助于分析各种高维数据集,以获得对复杂生物现象的新见解。

相似文献

1
Identifying patterns differing between high-dimensional datasets with generalized contrastive PCA.使用广义对比主成分分析识别高维数据集之间的差异模式。
bioRxiv. 2024 Aug 9:2024.08.08.607264. doi: 10.1101/2024.08.08.607264.
2
Identifying patterns differing between high-dimensional datasets with generalized contrastive PCA.使用广义对比主成分分析识别高维数据集之间的差异模式。
PLoS Comput Biol. 2025 Feb 7;21(2):e1012747. doi: 10.1371/journal.pcbi.1012747. eCollection 2025 Feb.
3
Exploring patterns enriched in a dataset with contrastive principal component analysis.用对比主成分分析探索数据集内的模式富集。
Nat Commun. 2018 May 30;9(1):2134. doi: 10.1038/s41467-018-04608-8.
4
Supporting Analysis of Dimensionality Reduction Results with Contrastive Learning.基于对比学习的降维结果支持分析
IEEE Trans Vis Comput Graph. 2020 Jan;26(1):45-55. doi: 10.1109/TVCG.2019.2934251. Epub 2019 Aug 19.
5
Exploring high-dimensional biological data with sparse contrastive principal component analysis.稀疏对比主成分分析在高维生物学数据中的应用。
Bioinformatics. 2020 Jun 1;36(11):3422-3430. doi: 10.1093/bioinformatics/btaa176.
6
Investigating Contrastive Pair Learning's Frontiers in Supervised, Semisupervised, and Self-Supervised Learning.探究对比对学习在监督学习、半监督学习和自监督学习中的前沿进展。
J Imaging. 2024 Aug 13;10(8):196. doi: 10.3390/jimaging10080196.
7
Nonlinear Dimensionality Reduction by Minimum Curvilinearity for Unsupervised Discovery of Patterns in Multidimensional Proteomic Data.基于最小曲率的非线性降维用于多维蛋白质组学数据模式的无监督发现
Methods Mol Biol. 2016;1384:289-98. doi: 10.1007/978-1-4939-3255-9_16.
8
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
9
CGRclust: Chaos Game Representation for twin contrastive clustering of unlabelled DNA sequences.CGRclust:用于未标记DNA序列双对比聚类的混沌游戏表示法
BMC Genomics. 2024 Dec 18;25(1):1214. doi: 10.1186/s12864-024-11135-y.
10
Network Comparison with Interpretable Contrastive Network Representation Learning.基于可解释对比网络表示学习的网络比较
J Data Sci Stat Vis. 2022 Sep 7;2(5). doi: 10.52933/jdssv.v2i5.56.

本文引用的文献

1
Imeglimin Ameliorates β-Cell Apoptosis by Modulating the Endoplasmic Reticulum Homeostasis Pathway.依格列净通过调节内质网稳态通路改善β细胞凋亡。
Diabetes. 2022 Mar 1;71(3):424-439. doi: 10.2337/db21-0123.
2
/ depletion in β cells alleviates ER stress and corrects hepatic steatosis in mice.β细胞耗竭可减轻内质网应激并纠正小鼠的肝脂肪变性。
Sci Transl Med. 2021 Jul 28;13(604). doi: 10.1126/scitranslmed.aba9796.
3
Subtypes of Type 2 Diabetes Determined From Clinical Parameters.基于临床参数的 2 型糖尿病亚型。
Diabetes. 2020 Oct;69(10):2086-2093. doi: 10.2337/dbi20-0001. Epub 2020 Aug 25.
4
Exploring high-dimensional biological data with sparse contrastive principal component analysis.稀疏对比主成分分析在高维生物学数据中的应用。
Bioinformatics. 2020 Jun 1;36(11):3422-3430. doi: 10.1093/bioinformatics/btaa176.
5
Pancreatic islet chromatin accessibility and conformation reveals distal enhancer networks of type 2 diabetes risk.胰腺胰岛染色质可及性和构象揭示 2 型糖尿病风险的远端增强子网络。
Nat Commun. 2019 May 7;10(1):2078. doi: 10.1038/s41467-019-09975-4.
6
Exploring patterns enriched in a dataset with contrastive principal component analysis.用对比主成分分析探索数据集内的模式富集。
Nat Commun. 2018 May 30;9(1):2134. doi: 10.1038/s41467-018-04608-8.
7
Cocaine Place Conditioning Strengthens Location-Specific Hippocampal Coupling to the Nucleus Accumbens.可卡因位置条件作用增强了海马体与伏隔核的位置特异性耦合。
Neuron. 2018 Jun 6;98(5):926-934.e5. doi: 10.1016/j.neuron.2018.04.015. Epub 2018 May 10.
8
promoter in human pancreatic β cells contacts diabetes susceptibility loci and regulates genes affecting insulin metabolism.在人类胰腺β细胞中,启动子与糖尿病易感性位点接触,并调节影响胰岛素代谢的基因。
Proc Natl Acad Sci U S A. 2018 May 15;115(20):E4633-E4641. doi: 10.1073/pnas.1803146115. Epub 2018 Apr 30.
9
Fully integrated silicon probes for high-density recording of neural activity.用于神经活动高密度记录的全集成硅探针。
Nature. 2017 Nov 8;551(7679):232-236. doi: 10.1038/nature24636.
10
Reactivations of emotional memory in the hippocampus-amygdala system during sleep.睡眠期间海马-杏仁核系统中情绪记忆的再激活。
Nat Neurosci. 2017 Nov;20(11):1634-1642. doi: 10.1038/nn.4637. Epub 2017 Sep 11.