Voyez Antonin, Allard Tristan, Avoine Gildas, Cauchois Pierre, Fromont Elisa, Simonin Matthieu
University of Rennes, CNRS, IRISA, Rennes, France.
Enedis, Puteaux, France.
Sci Rep. 2025 May 19;15(1):17391. doi: 10.1038/s41598-024-78285-7.
The collection of electrical consumption time series through smart meters grows with ambitious nationwide smart grid programs. This data is both highly sensitive and highly valuable: strong laws about personal data protect it while laws about open data aim at making it public after a privacy-preserving data publishing process. In this work, we study the uniqueness of large scale real-life fine-grained electrical consumption time-series and show its link to privacy threats. Our results show a worryingly high uniqueness rate in such datasets. In particular, we show that knowing 5 consecutive electric measures allows to re-identify on average more than 90% of households in our 2.5M half-hourly electric time series dataset. Moreover, uniqueness remains high even when data is severely degraded. For example, when data is rounded to the nearest 100 watts, knowing 7 consecutive electric measures allows to re-identify on average more than 40% of the households (same dataset). We also study the relationship between uniqueness and entropy, uniqueness and electric consumption, and electric consumption and temperatures, showing their strong correlation.
通过智能电表收集的电力消耗时间序列数据随着雄心勃勃的全国性智能电网计划而不断增长。这些数据既高度敏感又极具价值:关于个人数据的严格法律对其加以保护,而关于开放数据的法律旨在通过隐私保护数据发布流程使其公开。在这项工作中,我们研究大规模真实细粒度电力消耗时间序列的独特性,并展示其与隐私威胁的关联。我们的结果显示,此类数据集中的独特性比率高得令人担忧。特别是,我们表明,在我们拥有250万个半小时电力时间序列数据集的情况下,知晓连续5次的电度量平均能重新识别出超过90%的家庭。此外,即便数据严重退化,独特性依然很高。例如,当数据四舍五入到最接近的100瓦时,知晓连续7次的电度量平均能重新识别出超过40%的家庭(同一数据集)。我们还研究了独特性与熵、独特性与电力消耗以及电力消耗与温度之间的关系,展示了它们之间的强相关性。