Suppr超能文献

通过筛选和插补来开发可靠的每小时电力需求数据。

Developing reliable hourly electricity demand data through screening and imputation.

机构信息

Carnegie Institution for Science, Stanford, United States.

University of California, Irvine, Irvine, United States.

出版信息

Sci Data. 2020 May 26;7(1):155. doi: 10.1038/s41597-020-0483-x.

Abstract

Electricity usage (demand) data are used by utilities, governments, and academics to model electric grids for a variety of planning (e.g., capacity expansion and system operation) purposes. The U.S. Energy Information Administration collects hourly demand data from all balancing authorities (BAs) in the contiguous United States. As of September 2019, we find 2.2% of the demand data in their database are missing. Additionally, 0.5% of reported quantities are either negative values or are otherwise identified as outliers. With the goal of attaining non-missing, continuous, and physically plausible demand data to facilitate analysis, we developed a screening process to identify anomalous values. We then applied a Multiple Imputation by Chained Equations (MICE) technique to impute replacements for missing and anomalous values. We conduct cross-validation on the MICE technique by marking subsets of plausible data as missing, and using the remaining data to predict this "missing" data. The mean absolute percentage error of imputed values is 3.5% across all BAs. The cleaned data are published and available open access: https://doi.org/10.5281/zenodo.3690240.

摘要

电力使用(需求)数据被公用事业公司、政府和学术界用于为各种规划目的(如容量扩展和系统运行)建模电网。美国能源信息署从美国大陆的所有平衡区(BAs)收集每小时的需求数据。截至 2019 年 9 月,我们发现数据库中有 2.2%的需求数据丢失。此外,报告数量中有 0.5%要么是负值,要么被确定为异常值。为了获得非缺失、连续和符合物理规律的需求数据以促进分析,我们开发了一个筛选过程来识别异常值。然后,我们应用了链式方程多重插补(MICE)技术来插补缺失值和异常值的替换值。我们通过将合理数据的子集标记为缺失,并使用其余数据来预测这些“缺失”数据,对 MICE 技术进行交叉验证。所有 BAs 的插补值的平均绝对百分比误差为 3.5%。经过清理的数据已发布并可公开获取:https://doi.org/10.5281/zenodo.3690240。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验