Steinbrener Jan, Dimitrievska Vesna, Pittino Federico, Starmans Frans, Waldner Roland, Holzbauer Jürgen, Arnold Thomas
Control of Networked Systems Group, University of Klagenfurt, Universitaetsstr. 65- 67, Klagenfurt, 9020, Carinthia, Austria.
Silicon Austria Labs GmbH, Europastraße 12, Villach, 9524, Carinthia, Austria.
Heliyon. 2023 Mar 21;9(4):e14722. doi: 10.1016/j.heliyon.2023.e14722. eCollection 2023 Apr.
We present a novel approach for extracting metric volume information of fruits and vegetables from short monocular video sequences and associated inertial data recorded with a hand-held smartphone. Estimated segmentation masks from a pre-trained object detector are fused with the predicted change in relative pose obtained from the inertial data to predict the class and volume of the objects of interest. Our approach works with simple RGB video frames and inertial data which are readily available from modern smartphones. It does not require reference objects of known size in the video frames. Using a balanced validation dataset, we achieve a classification accuracy of 95% and a mean absolute percentage error for the volume prediction of 16% on untrained objects, which is comparable to state-of-the-art results requiring more elaborated data recording setups. A very accurate estimation of the model uncertainty is achieved through ensembling and the use of Gaussian negative log-likelihood loss. The dataset used in our experiments including ground-truth volume information is available at https://sst.aau.at/cns/datasets.
我们提出了一种新颖的方法,用于从手持智能手机记录的短单目视频序列和相关惯性数据中提取水果和蔬菜的度量体积信息。预训练目标检测器估计的分割掩码与从惯性数据获得的相对姿态预测变化相融合,以预测感兴趣对象的类别和体积。我们的方法适用于简单的RGB视频帧和惯性数据,这些数据可从现代智能手机轻松获取。它不需要视频帧中已知大小的参考对象。使用平衡验证数据集,我们在未训练对象上实现了95%的分类准确率和16%的体积预测平均绝对百分比误差,这与需要更复杂数据记录设置的现有技术结果相当。通过集成和使用高斯负对数似然损失,实现了对模型不确定性的非常准确的估计。我们实验中使用的包含地面真值体积信息的数据集可在https://sst.aau.at/cns/datasets获取。