Suppr超能文献

机器学习方法作为一种分析不完全或不规则采样氡时间序列数据的工具。

Machine learning methods as a tool to analyse incomplete or irregularly sampled radon time series data.

机构信息

The National Institutes for Quantum and Radiological Science and Technology (QST), National Institute of Radiological Sciences (NIRS), 4-9-1 Anagawa, Inage-ku, 263-8555 Chiba, Japan.

German Federal Office for Radiation Protection (BfS), Koepenicker Allee 120-130, Berlin 10318, Germany.

出版信息

Sci Total Environ. 2018 Jul 15;630:1155-1167. doi: 10.1016/j.scitotenv.2018.02.233. Epub 2018 Mar 7.

Abstract

Machine learning is a class of statistical techniques which has proven to be a powerful tool for modelling the behaviour of complex systems, in which response quantities depend on assumed controls or predictors in a complicated way. In this paper, as our first purpose, we propose the application of machine learning to reconstruct incomplete or irregularly sampled data of time series indoor radon (Rn). The physical assumption underlying the modelling is that Rn concentration in the air is controlled by environmental variables such as air temperature and pressure. The algorithms "learn" from complete sections of multivariate series, derive a dependence model and apply it to sections where the controls are available, but not the response (Rn), and in this way complete the Rn series. Three machine learning techniques are applied in this study, namely random forest, its extension called the gradient boosting machine and deep learning. For a comparison, we apply the classical multiple regression in a generalized linear model version. Performance of the models is evaluated through different metrics. The performance of the gradient boosting machine is found to be superior to that of the other techniques. By applying learning machines, we show, as our second purpose, that missing data or periods of Rn series data can be reconstructed and resampled on a regular grid reasonably, if data of appropriate physical controls are available. The techniques also identify to which degree the assumed controls contribute to imputing missing Rn values. Our third purpose, though no less important from the viewpoint of physics, is identifying to which degree physical, in this case environmental variables, are relevant as Rn predictors, or in other words, which predictors explain most of the temporal variability of Rn. We show that variables which contribute most to the Rn series reconstruction, are temperature, relative humidity and day of the year. The first two are physical predictors, while "day of the year" is a statistical proxy or surrogate for missing or unknown predictors.

摘要

机器学习是一类统计技术,已被证明是建模复杂系统行为的有力工具,在这些系统中,响应量以复杂的方式依赖于假设的控制或预测因子。在本文中,作为我们的第一个目的,我们提出将机器学习应用于重建时间序列室内氡(Rn)的不完整或不规则采样数据。建模的物理假设是,空气中的氡浓度受环境变量(如空气温度和压力)控制。算法从完整的多元序列部分“学习”,推导出一个依赖模型,并将其应用于控制变量可用但响应(Rn)不可用的部分,从而完成 Rn 序列。在这项研究中,应用了三种机器学习技术,即随机森林、其扩展称为梯度提升机和深度学习。为了进行比较,我们在广义线性模型版本中应用了经典的多元回归。通过不同的指标来评估模型的性能。发现梯度提升机的性能优于其他技术。通过应用学习机,作为我们的第二个目的,我们表明,如果有适当的物理控制数据,则可以合理地重建和重新采样缺失的 Rn 序列数据或 Rn 序列数据的时间段。该技术还可以确定假设的控制变量在多大程度上有助于推断缺失的 Rn 值。虽然从物理角度来看,我们的第三个目的同样重要,但目的是确定物理(在这种情况下是环境变量)作为 Rn 预测因子的相关程度,或者换句话说,哪些预测因子解释了 Rn 时间变化的大部分。我们表明,对 Rn 序列重建贡献最大的变量是温度、相对湿度和一年中的天数。前两个是物理预测因子,而“一年中的天数”是缺失或未知预测因子的统计代理或替代物。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验