机器学习方法作为一种分析不完全或不规则采样氡时间序列数据的工具。

Machine learning methods as a tool to analyse incomplete or irregularly sampled radon time series data.

机构信息

The National Institutes for Quantum and Radiological Science and Technology (QST), National Institute of Radiological Sciences (NIRS), 4-9-1 Anagawa, Inage-ku, 263-8555 Chiba, Japan.

German Federal Office for Radiation Protection (BfS), Koepenicker Allee 120-130, Berlin 10318, Germany.

出版信息

Sci Total Environ. 2018 Jul 15;630:1155-1167. doi: 10.1016/j.scitotenv.2018.02.233. Epub 2018 Mar 7.

DOI:10.1016/j.scitotenv.2018.02.233

PMID:29554737

Abstract

Machine learning is a class of statistical techniques which has proven to be a powerful tool for modelling the behaviour of complex systems, in which response quantities depend on assumed controls or predictors in a complicated way. In this paper, as our first purpose, we propose the application of machine learning to reconstruct incomplete or irregularly sampled data of time series indoor radon (Rn). The physical assumption underlying the modelling is that Rn concentration in the air is controlled by environmental variables such as air temperature and pressure. The algorithms "learn" from complete sections of multivariate series, derive a dependence model and apply it to sections where the controls are available, but not the response (Rn), and in this way complete the Rn series. Three machine learning techniques are applied in this study, namely random forest, its extension called the gradient boosting machine and deep learning. For a comparison, we apply the classical multiple regression in a generalized linear model version. Performance of the models is evaluated through different metrics. The performance of the gradient boosting machine is found to be superior to that of the other techniques. By applying learning machines, we show, as our second purpose, that missing data or periods of Rn series data can be reconstructed and resampled on a regular grid reasonably, if data of appropriate physical controls are available. The techniques also identify to which degree the assumed controls contribute to imputing missing Rn values. Our third purpose, though no less important from the viewpoint of physics, is identifying to which degree physical, in this case environmental variables, are relevant as Rn predictors, or in other words, which predictors explain most of the temporal variability of Rn. We show that variables which contribute most to the Rn series reconstruction, are temperature, relative humidity and day of the year. The first two are physical predictors, while "day of the year" is a statistical proxy or surrogate for missing or unknown predictors.

摘要

机器学习是一类统计技术，已被证明是建模复杂系统行为的有力工具，在这些系统中，响应量以复杂的方式依赖于假设的控制或预测因子。在本文中，作为我们的第一个目的，我们提出将机器学习应用于重建时间序列室内氡（Rn）的不完整或不规则采样数据。建模的物理假设是，空气中的氡浓度受环境变量（如空气温度和压力）控制。算法从完整的多元序列部分“学习”，推导出一个依赖模型，并将其应用于控制变量可用但响应（Rn）不可用的部分，从而完成 Rn 序列。在这项研究中，应用了三种机器学习技术，即随机森林、其扩展称为梯度提升机和深度学习。为了进行比较，我们在广义线性模型版本中应用了经典的多元回归。通过不同的指标来评估模型的性能。发现梯度提升机的性能优于其他技术。通过应用学习机，作为我们的第二个目的，我们表明，如果有适当的物理控制数据，则可以合理地重建和重新采样缺失的 Rn 序列数据或 Rn 序列数据的时间段。该技术还可以确定假设的控制变量在多大程度上有助于推断缺失的 Rn 值。虽然从物理角度来看，我们的第三个目的同样重要，但目的是确定物理（在这种情况下是环境变量）作为 Rn 预测因子的相关程度，或者换句话说，哪些预测因子解释了 Rn 时间变化的大部分。我们表明，对 Rn 序列重建贡献最大的变量是温度、相对湿度和一年中的天数。前两个是物理预测因子，而“一年中的天数”是缺失或未知预测因子的统计代理或替代物。

相似文献

Machine learning methods as a tool to analyse incomplete or irregularly sampled radon time series data.

Sci Total Environ. 2018 Jul 15;630:1155-1167. doi: 10.1016/j.scitotenv.2018.02.233. Epub 2018 Mar 7.

Mapping the geogenic radon potential for Germany by machine learning.

Sci Total Environ. 2021 Feb 1;754:142291. doi: 10.1016/j.scitotenv.2020.142291. Epub 2020 Sep 14.

Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data.

PLoS One. 2022 Jan 13;17(1):e0262131. doi: 10.1371/journal.pone.0262131. eCollection 2022.

Application of spectral decomposition of ²²²Rn activity concentration signal series measured in Niedźwiedzia Cave to identification of mechanisms responsible for different time-period variations.

Appl Radiat Isot. 2015 Oct;104:74-86. doi: 10.1016/j.apradiso.2015.06.029. Epub 2015 Jun 24.

Testing of Rn application for recognizing tectonic events observed on water-tube tiltmeters in underground Geodynamic Laboratory of Space Research Centre at Książ (the Sudetes, SW Poland).

Appl Radiat Isot. 2020 Sep;163:108967. doi: 10.1016/j.apradiso.2019.108967. Epub 2019 Nov 1.

A complexity measure based method for studying the dependance of 222Rn concentration time series on indoor air temperature and humidity.

Appl Radiat Isot. 2014 Feb;84:27-32. doi: 10.1016/j.apradiso.2013.10.016. Epub 2013 Nov 5.

A comparison between discrete and continuous time Bayesian networks in learning from clinical time series data with irregularity.

Artif Intell Med. 2019 Apr;95:104-117. doi: 10.1016/j.artmed.2018.10.002. Epub 2019 Jan 22.

Machine learning for the analysis of indoor radon distribution, compared with ordinary kriging.

Radiat Prot Dosimetry. 2009 Dec;137(3-4):324-8. doi: 10.1093/rpd/ncp254. Epub 2009 Nov 13.

Machine learning in environmental radon science.

Appl Radiat Isot. 2023 Apr;194:110684. doi: 10.1016/j.apradiso.2023.110684. Epub 2023 Jan 14.

Investigations on indoor radon in Austria, Part 1: Seasonality of indoor radon concentration.

J Environ Radioact. 2007;98(3):329-45. doi: 10.1016/j.jenvrad.2007.06.006. Epub 2007 Aug 17.

引用本文的文献

Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data.

PLoS One. 2022 Jan 13;17(1):e0262131. doi: 10.1371/journal.pone.0262131. eCollection 2022.

Radiological Assessment of Indoor Radon and Thoron Concentrations and Indoor Radon Map of Dwellings in Mashhad, Iran.

Int J Environ Res Public Health. 2020 Dec 28;18(1):141. doi: 10.3390/ijerph18010141.

Development of a Geogenic Radon Hazard Index-Concept, History, Experiences.

Int J Environ Res Public Health. 2020 Jun 10;17(11):4134. doi: 10.3390/ijerph17114134.

Automated classification platform for the identification of otitis media using optical coherence tomography.

NPJ Digit Med. 2019 Mar 28;2:22. doi: 10.1038/s41746-019-0094-0. eCollection 2019.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

机器学习方法作为一种分析不完全或不规则采样氡时间序列数据的工具。

Machine learning methods as a tool to analyse incomplete or irregularly sampled radon time series data.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献