Castellini Alberto, Bloisi Domenico, Blum Jason, Masillo Francesco, Farinelli Alessandro
Department of Computer Science, University of Verona, Strada le Grazie 15, 37134 Verona, Italy.
Department of Mathematics, Computer Science, and Economics, University of Basilicata, Viale dell'AteneoLucano, 10, 85100 Potenza, Italy.
Data Brief. 2020 Mar 19;30:105436. doi: 10.1016/j.dib.2020.105436. eCollection 2020 Jun.
Sensor data generated by intelligent systems, such as autonomous robots, smart buildings and other systems based on artificial intelligence, represent valuable sources of knowledge in today's data-driven society, since they contain information about the situations these systems face during their operation. These data are usually multivariate time series since modern technologies enable the simultaneous acquisition of multiple signals during long periods of time. In this paper we present a dataset containing sensor traces of six data acquisition campaigns performed by autonomous aquatic drones involved in water monitoring. A total of 5.6 h of navigation are available, with data coming from both lakes and rivers, and from different locations in Italy and Spain. The monitored variables concern both the internal state of the drone (e.g., battery voltage, GPS position and signals to propellers) and the state of the water (e.g., temperature, dissolved oxygen and electrical conductivity). Data were collected in the context of the EU-funded Horizon 2020 project INTCATCH (http://www.intcatch.eu) which aims to develop a new paradigm for monitoring water quality of catchments. The aquatic drones used for data acquisition are Platypus Lutra boats. Both autonomous and manual drive is used in different parts of the navigation. The dataset is analyzed in the paper "Time series segmentation for state-model generation of autonomous aquatic drones: A systematic framework" [1] by means of recent time series clustering/segmentation techniques to extract data-driven models of the situations faced by the drones in the data acquisition campaigns. These data have strong potential for reuse in other kinds of data analysis and evaluation of machine learning methods on real-world datasets [2]. Moreover, we consider this dataset valuable also for the variety of situations faced by the drone, from which machine learning techniques can learn behavioral patterns or detect anomalous activities. We also provide manual labeling for some known states of the drones, such as, drone inside/outside the water, upstream/downstream navigation, manual/autonomous drive, and drone turning, that represent a ground truth for validation purposes. Finally, the real-world nature of the dataset makes it more challenging for machine learning methods because it contains noisy samples collected while the drone was exposed to atmospheric agents and uncertain water flow conditions.
由智能系统生成的传感器数据,如自主机器人、智能建筑和其他基于人工智能的系统,在当今数据驱动的社会中是宝贵的知识来源,因为它们包含了这些系统在运行过程中所面临情况的信息。这些数据通常是多变量时间序列,因为现代技术能够在长时间内同时采集多个信号。在本文中,我们展示了一个数据集,该数据集包含参与水监测的自主水上无人机进行的六次数据采集活动的传感器轨迹。共有5.6小时的导航数据,数据来自湖泊和河流,以及意大利和西班牙的不同地点。监测变量涉及无人机的内部状态(如电池电压、GPS位置和螺旋桨信号)以及水的状态(如温度、溶解氧和电导率)。数据是在欧盟资助的地平线2020项目INTCATCH(http://www.intcatch.eu)的背景下收集的,该项目旨在开发一种监测集水区水质的新范式。用于数据采集的水上无人机是鸭嘴兽水獭船。在导航的不同部分使用了自主驱动和手动驱动。本文“用于自主水上无人机状态模型生成的时间序列分割:一个系统框架”[1]通过最近的时间序列聚类/分割技术对该数据集进行了分析,以提取数据驱动的模型,这些模型描述了无人机在数据采集活动中所面临的情况。这些数据在其他类型的数据分析以及对真实世界数据集上的机器学习方法进行评估方面具有很强的重用潜力[2]。此外,我们认为这个数据集很有价值,还因为无人机面临的各种情况,机器学习技术可以从中学习行为模式或检测异常活动。我们还为无人机的一些已知状态提供了手动标注,如无人机在水内/外、上游/下游导航、手动/自主驱动以及无人机转弯,这些标注代表了用于验证目的的真实情况。最后,该数据集的真实世界性质使得机器学习方法面临更大的挑战,因为它包含了无人机在暴露于大气因素和不确定水流条件下收集的噪声样本。