• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

复杂系统的低成本可扩展离散化、预测和特征选择

Low-cost scalable discretization, prediction, and feature selection for complex systems.

作者信息

Gerber S, Pospisil L, Navandar M, Horenko I

机构信息

Center of Computational Sciences, Johannes-Gutenberg-University of Mainz, PhysMat/Staudingerweg 9, 55128 Mainz, Germany.

Faculty of Informatics, Universita della Svizzera Italiana, Via G. Buffi 13, 6900 Lugano Switzerland.

出版信息

Sci Adv. 2020 Jan 29;6(5):eaaw0961. doi: 10.1126/sciadv.aaw0961. eCollection 2020 Jan.

DOI:10.1126/sciadv.aaw0961
PMID:32064328
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6989146/
Abstract

Finding reliable discrete approximations of complex systems is a key prerequisite when applying many of the most popular modeling tools. Common discretization approaches (e.g., the very popular -means clustering) are crucially limited in terms of quality, parallelizability, and cost. We introduce a low-cost improved quality scalable probabilistic approximation (SPA) algorithm, allowing for simultaneous data-driven optimal discretization, feature selection, and prediction. We prove its optimality, parallel efficiency, and a linear scalability of iteration cost. Cross-validated applications of SPA to a range of large realistic data classification and prediction problems reveal marked cost and performance improvements. For example, SPA allows the data-driven next-day predictions of resimulated surface temperatures for Europe with the mean prediction error of 0.75°C on a common PC (being around 40% better in terms of errors and five to six orders of magnitude cheaper than with common computational instruments used by the weather services).

摘要

在应用许多最流行的建模工具时,找到复杂系统可靠的离散近似是一个关键前提。常见的离散化方法(例如非常流行的K均值聚类)在质量、并行性和成本方面存在严重限制。我们引入了一种低成本、质量改进的可扩展概率近似(SPA)算法,该算法允许同时进行数据驱动的最优离散化、特征选择和预测。我们证明了它的最优性、并行效率以及迭代成本的线性可扩展性。SPA在一系列大型实际数据分类和预测问题上的交叉验证应用显示出显著的成本和性能提升。例如,SPA允许对欧洲重新模拟的地表温度进行数据驱动的次日预测,在普通个人电脑上平均预测误差为0.75°C(在误差方面比气象服务部门使用的普通计算工具大约好40%,成本便宜五到六个数量级)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0f0/6989146/69cbac61e703/aaw0961-F4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0f0/6989146/d8a191259b3e/aaw0961-F1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0f0/6989146/b35dcaf05626/aaw0961-F2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0f0/6989146/bd555e94cd4b/aaw0961-F3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0f0/6989146/69cbac61e703/aaw0961-F4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0f0/6989146/d8a191259b3e/aaw0961-F1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0f0/6989146/b35dcaf05626/aaw0961-F2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0f0/6989146/bd555e94cd4b/aaw0961-F3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0f0/6989146/69cbac61e703/aaw0961-F4.jpg

相似文献

1
Low-cost scalable discretization, prediction, and feature selection for complex systems.复杂系统的低成本可扩展离散化、预测和特征选择
Sci Adv. 2020 Jan 29;6(5):eaaw0961. doi: 10.1126/sciadv.aaw0961. eCollection 2020 Jan.
2
A scalable memetic algorithm for simultaneous instance and feature selection.一种用于同时进行实例和特征选择的可扩展Memetic算法。
Evol Comput. 2014 Spring;22(1):1-45. doi: 10.1162/EVCO_a_00102. Epub 2013 Aug 8.
3
On a Scalable Entropic Breaching of the Overfitting Barrier for Small Data Problems in Machine Learning.基于机器学习中小数据问题的可扩展信息泄露突破过拟合障碍
Neural Comput. 2020 Aug;32(8):1563-1579. doi: 10.1162/neco_a_01296. Epub 2020 Jun 10.
4
A New Representation in PSO for Discretization-Based Feature Selection.PSO 中基于离散化的特征选择的新表示。
IEEE Trans Cybern. 2018 Jun;48(6):1733-1746. doi: 10.1109/TCYB.2017.2714145. Epub 2017 Jun 23.
5
On robotic optimal path planning in polygonal regions with pseudo-Euclidean metrics.关于具有伪欧几里得度量的多边形区域中的机器人最优路径规划
IEEE Trans Syst Man Cybern B Cybern. 2007 Aug;37(4):925-36. doi: 10.1109/tsmcb.2007.896021.
6
Finding optimum width of discretization for gene expressions using functional annotations.利用功能注释找到基因表达离散化的最优宽度。
Comput Biol Med. 2017 Nov 1;90:59-67. doi: 10.1016/j.compbiomed.2017.09.010. Epub 2017 Sep 18.
7
Engineering Aspects of Olfaction嗅觉的工程学方面
8
A comparison of linear interpolation models for iterative CT reconstruction.迭代CT重建中线性插值模型的比较
Med Phys. 2016 Dec;43(12):6455. doi: 10.1118/1.4966134.
9
Supervised dimensionality reduction for big data.大数据的监督降维
Nat Commun. 2021 May 17;12(1):2872. doi: 10.1038/s41467-021-23102-2.
10
Scalable Electron Correlation Methods. 3. Efficient and Accurate Parallel Local Coupled Cluster with Pair Natural Orbitals (PNO-LCCSD).可扩展电子相关方法。3. 基于对自然轨道的高效准确并行局域耦合簇方法(PNO-LCCSD)
J Chem Theory Comput. 2017 Aug 8;13(8):3650-3675. doi: 10.1021/acs.jctc.7b00554. Epub 2017 Jul 19.

引用本文的文献

1
On Entropic Learning from Noisy Time Series in the Small Data Regime.小数据条件下基于噪声时间序列的熵学习
Entropy (Basel). 2024 Jun 28;26(7):553. doi: 10.3390/e26070553.
2
On cheap entropy-sparsified regression learning.关于廉价的熵稀疏回归学习。
Proc Natl Acad Sci U S A. 2023 Jan 3;120(1):e2214972120. doi: 10.1073/pnas.2214972120. Epub 2022 Dec 29.
3
A Resilience Related Glial-Neurovascular Network Is Transcriptionally Activated after Chronic Social Defeat in Male Mice.慢性社交挫败后,雄性小鼠的神经胶质-神经血管网络的弹性相关基因被转录激活。

本文引用的文献

1
A scalable approach to the computation of invariant measures for high-dimensional Markovian systems.一种用于高维马尔可夫系统不变测度计算的可扩展方法。
Sci Rep. 2018 Jan 29;8(1):1796. doi: 10.1038/s41598-018-19863-4.
2
Toward a direct and scalable identification of reduced models for categorical processes.迈向对分类过程简化模型的直接且可扩展的识别。
Proc Natl Acad Sci U S A. 2017 May 9;114(19):4863-4868. doi: 10.1073/pnas.1612619114. Epub 2017 Apr 21.
3
Improving clustering by imposing network information.通过引入网络信息改进聚类。
Cells. 2022 Oct 27;11(21):3405. doi: 10.3390/cells11213405.
4
Low-Cost Probabilistic 3D Denoising with Applications for Ultra-Low-Radiation Computed Tomography.用于超低辐射计算机断层扫描的低成本概率三维去噪
J Imaging. 2022 May 31;8(6):156. doi: 10.3390/jimaging8060156.
5
Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification.通过可解析求解的熵离群值稀疏化实现数据异常的廉价稳健学习。
Proc Natl Acad Sci U S A. 2022 Mar 1;119(9). doi: 10.1073/pnas.2119659119.
6
Co-Inference of Data Mislabelings Reveals Improved Models in Genomics and Breast Cancer Diagnostics.数据误标记的共同推断揭示了基因组学和乳腺癌诊断中改进的模型。
Front Artif Intell. 2022 Jan 5;4:739432. doi: 10.3389/frai.2021.739432. eCollection 2021.
7
Genomic basis for drought resistance in European beech forests threatened by climate change.气候变化威胁下的欧洲山毛榉林抗旱的基因组基础。
Elife. 2021 Jun 16;10:e65532. doi: 10.7554/eLife.65532.
8
A deeper look into natural sciences with physics-based and data-driven measures.运用基于物理和数据驱动的方法更深入地探究自然科学。
iScience. 2021 Feb 9;24(3):102171. doi: 10.1016/j.isci.2021.102171. eCollection 2021 Mar 19.
9
Reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines.不同下一代测序平台和生物信息学处理管道的基因组变异的可靠性。
BMC Genomics. 2021 Jan 19;22(1):62. doi: 10.1186/s12864-020-07362-8.
Sci Adv. 2015 Aug 7;1(7):e1500163. doi: 10.1126/sciadv.1500163. eCollection 2015 Aug.
4
On inference of causality for discrete state models in a multiscale context.多尺度背景下离散状态模型因果关系推断。
Proc Natl Acad Sci U S A. 2014 Oct 14;111(41):14651-6. doi: 10.1073/pnas.1410404111. Epub 2014 Sep 29.
5
Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex.低覆盖度单细胞mRNA测序揭示发育中大脑皮层的细胞异质性和激活的信号通路。
Nat Biotechnol. 2014 Oct;32(10):1053-8. doi: 10.1038/nbt.2967. Epub 2014 Aug 3.
6
Convex and semi-nonnegative matrix factorizations.凸和半非负矩阵分解。
IEEE Trans Pattern Anal Mach Intell. 2010 Jan;32(1):45-55. doi: 10.1109/TPAMI.2008.277.
7
Intraseasonal interaction between the Madden-Julian Oscillation and the North Atlantic Oscillation.马登-朱利安振荡与北大西洋涛动之间的季节内相互作用。
Nature. 2008 Sep 25;455(7212):523-7. doi: 10.1038/nature07286.
8
Projected gradient methods for nonnegative matrix factorization.非负矩阵分解的投影梯度法。
Neural Comput. 2007 Oct;19(10):2756-79. doi: 10.1162/neco.2007.19.10.2756.
9
How does gene expression clustering work?基因表达聚类是如何工作的?
Nat Biotechnol. 2005 Dec;23(12):1499-501. doi: 10.1038/nbt1205-1499.
10
Learning the parts of objects by non-negative matrix factorization.通过非负矩阵分解学习物体的各个部分。
Nature. 1999 Oct 21;401(6755):788-91. doi: 10.1038/44565.