Suppr超能文献

复杂系统的低成本可扩展离散化、预测和特征选择

Low-cost scalable discretization, prediction, and feature selection for complex systems.

作者信息

Gerber S, Pospisil L, Navandar M, Horenko I

机构信息

Center of Computational Sciences, Johannes-Gutenberg-University of Mainz, PhysMat/Staudingerweg 9, 55128 Mainz, Germany.

Faculty of Informatics, Universita della Svizzera Italiana, Via G. Buffi 13, 6900 Lugano Switzerland.

出版信息

Sci Adv. 2020 Jan 29;6(5):eaaw0961. doi: 10.1126/sciadv.aaw0961. eCollection 2020 Jan.

Abstract

Finding reliable discrete approximations of complex systems is a key prerequisite when applying many of the most popular modeling tools. Common discretization approaches (e.g., the very popular -means clustering) are crucially limited in terms of quality, parallelizability, and cost. We introduce a low-cost improved quality scalable probabilistic approximation (SPA) algorithm, allowing for simultaneous data-driven optimal discretization, feature selection, and prediction. We prove its optimality, parallel efficiency, and a linear scalability of iteration cost. Cross-validated applications of SPA to a range of large realistic data classification and prediction problems reveal marked cost and performance improvements. For example, SPA allows the data-driven next-day predictions of resimulated surface temperatures for Europe with the mean prediction error of 0.75°C on a common PC (being around 40% better in terms of errors and five to six orders of magnitude cheaper than with common computational instruments used by the weather services).

摘要

在应用许多最流行的建模工具时,找到复杂系统可靠的离散近似是一个关键前提。常见的离散化方法(例如非常流行的K均值聚类)在质量、并行性和成本方面存在严重限制。我们引入了一种低成本、质量改进的可扩展概率近似(SPA)算法,该算法允许同时进行数据驱动的最优离散化、特征选择和预测。我们证明了它的最优性、并行效率以及迭代成本的线性可扩展性。SPA在一系列大型实际数据分类和预测问题上的交叉验证应用显示出显著的成本和性能提升。例如,SPA允许对欧洲重新模拟的地表温度进行数据驱动的次日预测,在普通个人电脑上平均预测误差为0.75°C(在误差方面比气象服务部门使用的普通计算工具大约好40%,成本便宜五到六个数量级)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0f0/6989146/d8a191259b3e/aaw0961-F1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验