• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

域适应主成分分析:用于处理分布外数据学习的基础线性方法

Domain Adaptation Principal Component Analysis: Base Linear Method for Learning with Out-of-Distribution Data.

作者信息

Mirkes Evgeny M, Bac Jonathan, Fouché Aziz, Stasenko Sergey V, Zinovyev Andrei, Gorban Alexander N

机构信息

School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK.

Institut Curie, PSL Research University, 75005 Paris, France.

出版信息

Entropy (Basel). 2022 Dec 24;25(1):33. doi: 10.3390/e25010033.

DOI:10.3390/e25010033
PMID:36673174
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9858254/
Abstract

Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the problem of divergence (or shift) between the labeled training and validation datasets (source domain) and a potentially large unlabeled dataset (target domain). The task is to embed both datasets into a common space in which the source dataset is informative for training while the divergence between source and target is minimized. The most popular domain adaptation solutions are based on training neural networks that combine classification and adversarial learning modules, frequently making them both data-hungry and difficult to train. We present a method called Domain Adaptation Principal Component Analysis (DAPCA) that identifies a linear reduced data representation useful for solving the domain adaptation task. DAPCA algorithm introduces positive and negative weights between pairs of data points, and generalizes the supervised extension of principal component analysis. DAPCA is an iterative algorithm that solves a simple quadratic optimization problem at each iteration. The convergence of the algorithm is guaranteed, and the number of iterations is small in practice. We validate the suggested algorithm on previously proposed benchmarks for solving the domain adaptation task. We also show the benefit of using DAPCA in analyzing single-cell omics datasets in biomedical applications. Overall, DAPCA can serve as a practical preprocessing step in many machine learning applications leading to reduced dataset representations, taking into account possible divergence between source and target domains.

摘要

域适应是现代机器学习中的一种流行范式,旨在解决有标签的训练和验证数据集(源域)与潜在的大型无标签数据集(目标域)之间的差异(或偏移)问题。任务是将两个数据集嵌入到一个公共空间中,在这个空间里源数据集对训练有参考价值,同时源域和目标域之间的差异最小化。最流行的域适应解决方案是基于训练结合了分类和对抗学习模块的神经网络,这常常使它们既需要大量数据又难以训练。我们提出了一种称为域适应主成分分析(DAPCA)的方法,该方法可识别出有助于解决域适应任务的线性降维数据表示。DAPCA算法在数据点对之间引入正权重和负权重,并推广了主成分分析的监督扩展。DAPCA是一种迭代算法,在每次迭代时解决一个简单的二次优化问题。该算法的收敛性有保证,并且在实际中迭代次数较少。我们在先前提出的用于解决域适应任务的基准上验证了所建议的算法。我们还展示了在生物医学应用中使用DAPCA分析单细胞组学数据集的优势。总体而言,考虑到源域和目标域之间可能存在的差异,DAPCA可以作为许多机器学习应用中的一个实用预处理步骤,从而减少数据集的表示。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54c/9858254/31e150648511/entropy-25-00033-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54c/9858254/438b99d37d3b/entropy-25-00033-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54c/9858254/f46202ec81df/entropy-25-00033-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54c/9858254/0b4fa56c9de4/entropy-25-00033-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54c/9858254/e6f20618aeb2/entropy-25-00033-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54c/9858254/0c8515347985/entropy-25-00033-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54c/9858254/b018b0b2d7fc/entropy-25-00033-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54c/9858254/31e150648511/entropy-25-00033-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54c/9858254/438b99d37d3b/entropy-25-00033-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54c/9858254/f46202ec81df/entropy-25-00033-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54c/9858254/0b4fa56c9de4/entropy-25-00033-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54c/9858254/e6f20618aeb2/entropy-25-00033-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54c/9858254/0c8515347985/entropy-25-00033-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54c/9858254/b018b0b2d7fc/entropy-25-00033-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54c/9858254/31e150648511/entropy-25-00033-g007.jpg

相似文献

1
Domain Adaptation Principal Component Analysis: Base Linear Method for Learning with Out-of-Distribution Data.域适应主成分分析:用于处理分布外数据学习的基础线性方法
Entropy (Basel). 2022 Dec 24;25(1):33. doi: 10.3390/e25010033.
2
Semi-supervised adversarial discriminative domain adaptation.半监督对抗性判别域适应
Appl Intell (Dordr). 2023;53(12):15909-15922. doi: 10.1007/s10489-022-04288-4. Epub 2022 Nov 29.
3
Disentanglement by Cyclic Reconstruction.通过循环重建进行解缠
IEEE Trans Neural Netw Learn Syst. 2024 May;35(5):6693-6702. doi: 10.1109/TNNLS.2022.3212620. Epub 2024 May 2.
4
Multi-source adaptation joint kernel sparse representation for visual classification.多源自适应联合核稀疏表示的视觉分类。
Neural Netw. 2016 Apr;76:135-151. doi: 10.1016/j.neunet.2016.01.008. Epub 2016 Feb 3.
5
Multiscale unsupervised domain adaptation for automatic pancreas segmentation in CT volumes using adversarial learning.基于对抗学习的 CT 容积中多尺度无监督域自适应自动胰腺分割。
Med Phys. 2022 Sep;49(9):5799-5818. doi: 10.1002/mp.15827. Epub 2022 Jul 27.
6
A transfer learning model with multi-source domains for biomedical event trigger extraction.一种用于生物医学事件触发词提取的多源域迁移学习模型。
BMC Genomics. 2021 Jan 7;22(1):31. doi: 10.1186/s12864-020-07315-1.
7
S-CUDA: Self-cleansing unsupervised domain adaptation for medical image segmentation.S-CUDA:用于医学图像分割的自清洁无监督域适应
Med Image Anal. 2021 Dec;74:102214. doi: 10.1016/j.media.2021.102214. Epub 2021 Aug 12.
8
Deep learning based domain adaptation for mitochondria segmentation on EM volumes.基于深度学习的 EM 体数据中线粒体分割领域自适应方法
Comput Methods Programs Biomed. 2022 Jul;222:106949. doi: 10.1016/j.cmpb.2022.106949. Epub 2022 Jun 14.
9
Scatter Component Analysis: A Unified Framework for Domain Adaptation and Domain Generalization.散列分量分析:一种用于领域自适应和领域泛化的统一框架。
IEEE Trans Pattern Anal Mach Intell. 2017 Jul;39(7):1414-1430. doi: 10.1109/TPAMI.2016.2599532. Epub 2016 Aug 11.
10
A Semi-Supervised Transfer Learning with Dynamic Associate Domain Adaptation for Human Activity Recognition Using WiFi Signals.基于半监督迁移学习和动态关联域自适应的 WiFi 信号人体活动识别
Sensors (Basel). 2021 Dec 19;21(24):8475. doi: 10.3390/s21248475.

引用本文的文献

1
Proteomics and machine learning: Leveraging domain knowledge for feature selection in a skeletal muscle tissue meta-analysis.蛋白质组学与机器学习:在骨骼肌组织荟萃分析中利用领域知识进行特征选择
Heliyon. 2024 Nov 29;10(24):e40772. doi: 10.1016/j.heliyon.2024.e40772. eCollection 2024 Dec 30.
2
CODI: Enhancing machine learning-based molecular profiling through contextual out-of-distribution integration.CODI:通过上下文分布外集成增强基于机器学习的分子谱分析
PNAS Nexus. 2024 Oct 15;3(10):pgae449. doi: 10.1093/pnasnexus/pgae449. eCollection 2024 Oct.
3
Omics data integration in computational biology viewed through the prism of machine learning paradigms.

本文引用的文献

1
Analysis of Dormancy-Associated Transcriptional Networks Reveals a Shared Quiescence Signature in Lung and Colorectal Cancer.休眠相关转录网络分析揭示了肺癌和结直肠癌中共同的静止特征。
Int J Mol Sci. 2022 Aug 30;23(17):9869. doi: 10.3390/ijms23179869.
2
Scikit-Dimension: A Python Package for Intrinsic Dimension Estimation.Scikit-Dimension:一个用于本征维度估计的Python包。
Entropy (Basel). 2021 Oct 19;23(10):1368. doi: 10.3390/e23101368.
3
High-Dimensional Separability for One- and Few-Shot Learning.用于单样本和少样本学习的高维可分性
从机器学习范式的角度审视计算生物学中的组学数据整合。
Front Bioinform. 2023 Aug 4;3:1191961. doi: 10.3389/fbinf.2023.1191961. eCollection 2023.
4
: a unifying computational framework for modular single-cell RNA-seq data integration.用于模块化单细胞RNA测序数据整合的统一计算框架。
NAR Genom Bioinform. 2023 Jul 12;5(3):lqad069. doi: 10.1093/nargab/lqad069. eCollection 2023 Sep.
5
Editorial: Toward and beyond human-level AI, volume II.社论:迈向人类水平的人工智能并超越,第二卷。
Front Neurorobot. 2023 Jan 6;16:1120167. doi: 10.3389/fnbot.2022.1120167. eCollection 2022.
Entropy (Basel). 2021 Aug 22;23(8):1090. doi: 10.3390/e23081090.
4
Computational principles and challenges in single-cell data integration.单细胞数据整合的计算原理与挑战。
Nat Biotechnol. 2021 Oct;39(10):1202-1215. doi: 10.1038/s41587-021-00895-7. Epub 2021 May 3.
5
Fractional Norms and Quasinorms Do Not Help to Overcome the Curse of Dimensionality.分数范数和拟范数无助于克服维数灾难。
Entropy (Basel). 2020 Sep 30;22(10):1105. doi: 10.3390/e22101105.
6
Robust and Scalable Learning of Complex Intrinsic Dataset Geometry via ElPiGraph.通过ElPiGraph对复杂内在数据集几何进行稳健且可扩展的学习
Entropy (Basel). 2020 Mar 4;22(3):296. doi: 10.3390/e22030296.
7
A molecular cell atlas of the human lung from single-cell RNA sequencing.人类肺部单细胞 RNA 测序的分子细胞图谱。
Nature. 2020 Nov;587(7835):619-625. doi: 10.1038/s41586-020-2922-4. Epub 2020 Nov 18.
8
Eleven grand challenges in single-cell data science.单细胞数据科学的 11 大挑战。
Genome Biol. 2020 Feb 7;21(1):31. doi: 10.1186/s13059-020-1926-6.
9
Benchmarking principal component analysis for large-scale single-cell RNA-sequencing.基于主成分分析的大规模单细胞 RNA-seq 基准测试
Genome Biol. 2020 Jan 20;21(1):9. doi: 10.1186/s13059-019-1900-3.
10
Independent Component Analysis for Unraveling the Complexity of Cancer Omics Datasets.独立成分分析在癌症组学数据集复杂性研究中的应用
Int J Mol Sci. 2019 Sep 7;20(18):4414. doi: 10.3390/ijms20184414.