• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过可解析求解的熵离群值稀疏化实现数据异常的廉价稳健学习。

Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification.

作者信息

Horenko Illia

机构信息

Faculty of Informatics, Institute of Computing, Universitá della Svizzera Italiana, TI-6900 Lugano, Switzerland

出版信息

Proc Natl Acad Sci U S A. 2022 Mar 1;119(9). doi: 10.1073/pnas.2119659119.

DOI:10.1073/pnas.2119659119
PMID:35197293
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8917346/
Abstract

Entropic outlier sparsification (EOS) is proposed as a cheap and robust computational strategy for learning in the presence of data anomalies and outliers. EOS dwells on the derived analytic solution of the (weighted) expected loss minimization problem subject to Shannon entropy regularization. An identified closed-form solution is proven to impose additional costs that depend linearly on statistics size and are independent of data dimension. Obtained analytic results also explain why the mixtures of spherically symmetric Gaussians-used heuristically in many popular data analysis algorithms-represent an optimal and least-biased choice for the nonparametric probability distributions when working with squared Euclidean distances. The performance of EOS is compared to a range of commonly used tools on synthetic problems and on partially mislabeled supervised classification problems from biomedicine. Applying EOS for coinference of data anomalies during learning is shown to allow reaching an accuracy of [Formula: see text] when predicting patient mortality after heart failure, statistically significantly outperforming predictive performance of common learning tools for the same data.

摘要

熵离群值稀疏化(EOS)被提出作为一种在存在数据异常和离群值的情况下进行学习的廉价且稳健的计算策略。EOS基于受香农熵正则化约束的(加权)期望损失最小化问题的推导解析解。已证明一个确定的闭式解会带来额外成本,这些成本线性依赖于统计量大小且与数据维度无关。所获得的分析结果还解释了为什么在许多流行数据分析算法中启发式使用的球对称高斯混合,在使用平方欧几里得距离时对于非参数概率分布而言代表了一种最优且偏差最小的选择。在合成问题以及来自生物医学的部分错误标记的监督分类问题上,将EOS的性能与一系列常用工具进行了比较。结果表明,在学习过程中应用EOS进行数据异常的共推断,在预测心力衰竭后患者死亡率时能够达到[公式:见原文]的准确率,在统计学上显著优于相同数据的常见学习工具的预测性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/270c/8917346/f776489efdef/pnas.2119659119fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/270c/8917346/f776489efdef/pnas.2119659119fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/270c/8917346/f776489efdef/pnas.2119659119fig01.jpg

相似文献

1
Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification.通过可解析求解的熵离群值稀疏化实现数据异常的廉价稳健学习。
Proc Natl Acad Sci U S A. 2022 Mar 1;119(9). doi: 10.1073/pnas.2119659119.
2
On Entropic Learning from Noisy Time Series in the Small Data Regime.小数据条件下基于噪声时间序列的熵学习
Entropy (Basel). 2024 Jun 28;26(7):553. doi: 10.3390/e26070553.
3
Comparison of methods for the detection of outliers and associated biomarkers in mislabeled omics data.比较用于检测组学数据中标记错误的异常值和相关生物标志物的方法。
BMC Bioinformatics. 2020 Aug 14;21(1):357. doi: 10.1186/s12859-020-03653-9.
4
An Ensemble Outlier Detection Method Based on Information Entropy-Weighted Subspaces for High-Dimensional Data.一种基于信息熵加权子空间的高维数据集成离群点检测方法。
Entropy (Basel). 2023 Aug 9;25(8):1185. doi: 10.3390/e25081185.
5
Robust reduced-rank regression.稳健降秩回归
Biometrika. 2017 Sep;104(3):633-647. doi: 10.1093/biomet/asx032. Epub 2017 Jul 12.
6
On a Scalable Entropic Breaching of the Overfitting Barrier for Small Data Problems in Machine Learning.基于机器学习中小数据问题的可扩展信息泄露突破过拟合障碍
Neural Comput. 2020 Aug;32(8):1563-1579. doi: 10.1162/neco_a_01296. Epub 2020 Jun 10.
7
Robustness of learning algorithms using hinge loss with outlier indicators.使用带有异常指示符的 hinge 损失的学习算法的稳健性。
Neural Netw. 2017 Oct;94:173-191. doi: 10.1016/j.neunet.2017.07.005. Epub 2017 Jul 21.
8
Generalized statistics: Applications to data inverse problems with outlier-resistance.广义统计学:抗异常值的数据反问题应用。
PLoS One. 2023 Mar 30;18(3):e0282578. doi: 10.1371/journal.pone.0282578. eCollection 2023.
9
Adversarially Robust Learning Entropic Regularization.对抗鲁棒学习 熵正则化
Front Artif Intell. 2022 Jan 4;4:780843. doi: 10.3389/frai.2021.780843. eCollection 2021.
10
Ensemble outlier detection and gene selection in triple-negative breast cancer data.三阴性乳腺癌数据中的集成异常值检测和基因选择。
BMC Bioinformatics. 2018 May 4;19(1):168. doi: 10.1186/s12859-018-2149-7.

引用本文的文献

1
Learning dynamical systems with hit-and-run random feature maps.使用撞闯式随机特征映射学习动态系统。
Nat Commun. 2025 Jul 1;16(1):5961. doi: 10.1038/s41467-025-61195-1.
2
Applications of Entropy in Data Analysis and Machine Learning: A Review.熵在数据分析与机器学习中的应用:综述
Entropy (Basel). 2024 Dec 23;26(12):1126. doi: 10.3390/e26121126.
3
On Entropic Learning from Noisy Time Series in the Small Data Regime.小数据条件下基于噪声时间序列的熵学习

本文引用的文献

1
Co-Inference of Data Mislabelings Reveals Improved Models in Genomics and Breast Cancer Diagnostics.数据误标记的共同推断揭示了基因组学和乳腺癌诊断中改进的模型。
Front Artif Intell. 2022 Jan 5;4:739432. doi: 10.3389/frai.2021.739432. eCollection 2021.
2
On a Scalable Entropic Breaching of the Overfitting Barrier for Small Data Problems in Machine Learning.基于机器学习中小数据问题的可扩展信息泄露突破过拟合障碍
Neural Comput. 2020 Aug;32(8):1563-1579. doi: 10.1162/neco_a_01296. Epub 2020 Jun 10.
3
Low-cost scalable discretization, prediction, and feature selection for complex systems.
Entropy (Basel). 2024 Jun 28;26(7):553. doi: 10.3390/e26070553.
4
On cheap entropy-sparsified regression learning.关于廉价的熵稀疏回归学习。
Proc Natl Acad Sci U S A. 2023 Jan 3;120(1):e2214972120. doi: 10.1073/pnas.2214972120. Epub 2022 Dec 29.
复杂系统的低成本可扩展离散化、预测和特征选择
Sci Adv. 2020 Jan 29;6(5):eaaw0961. doi: 10.1126/sciadv.aaw0961. eCollection 2020 Jan.
4
Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone.机器学习仅通过血清肌酐和射血分数即可预测心力衰竭患者的生存情况。
BMC Med Inform Decis Mak. 2020 Feb 3;20(1):16. doi: 10.1186/s12911-020-1023-5.
5
Translational Regulation of Non-autonomous Mitochondrial Stress Response Promotes Longevity.非自主线粒体应激反应的翻译调控促进长寿。
Cell Rep. 2019 Jul 23;28(4):1050-1062.e6. doi: 10.1016/j.celrep.2019.06.078.