Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei 23741, Taiwan.
Sensors (Basel). 2021 Jul 21;21(15):4966. doi: 10.3390/s21154966.
Public aquariums and similar institutions often use video as a method to monitor the behavior, health, and status of aquatic organisms in their environments. These video footages take up a sizeable amount of space and require the use of autoencoders to reduce their file size for efficient storage. The autoencoder neural network is an emerging technique which uses the extracted latent space from an input source to reduce the image size for storage, and then reconstructs the source within an acceptable loss range for use. To meet an aquarium's practical needs, the autoencoder must have easily maintainable codes, low power consumption, be easily adoptable, and not require a substantial amount of memory use or processing power. Conventional configurations of autoencoders often provide results that perform beyond an aquarium's needs at the cost of being too complex for their architecture to handle, while few take low-contrast sources into consideration. Thus, in this instance, "keeping it simple" would be the ideal approach to the autoencoder's model design. This paper proposes a practical approach catered to an aquarium's specific needs through the configuration of autoencoder parameters. It first explores the differences between the two of the most widely applied autoencoder approaches, Multilayer Perceptron (MLP) and Convolution Neural Networks (CNN), to identify the most appropriate approach. The paper concludes that while both approaches (with proper configurations and image preprocessing) can reduce the dimensionality and reduce visual noise of the low-contrast images gathered from aquatic video footage, the CNN approach is more suitable for an aquarium's architecture. As an unexpected finding of the experiments conducted, the paper also discovered that by manipulating the formula for the MLP approach, the autoencoder could generate a denoised differential image that contains sharper and more desirable visual information to an aquarium's operation. Lastly, the paper has found that proper image preprocessing prior to the application of the autoencoder led to better model convergence and prediction results, as demonstrated both visually and numerically in the experiment. The paper concludes that by combining the denoising effect of MLP, CNN's ability to manage memory consumption, and proper image preprocessing, the specific practical needs of an aquarium can be adeptly fulfilled.
公共水族馆和类似机构通常使用视频作为监测其环境中水生生物行为、健康和状态的方法。这些视频片段占用了相当大的空间,需要使用自动编码器来减小文件大小以进行高效存储。自动编码器神经网络是一种新兴技术,它使用从输入源提取的潜在空间来减小图像大小以进行存储,然后在可接受的损失范围内重建源以供使用。为了满足水族馆的实际需求,自动编码器必须具有易于维护的代码、低功耗、易于采用,并且不需要大量的内存使用或处理能力。传统的自动编码器配置通常会提供超出水族馆需求的结果,但代价是对于其架构来说过于复杂,而很少考虑低对比度源。因此,在这种情况下,“保持简单”将是自动编码器模型设计的理想方法。本文提出了一种通过配置自动编码器参数来满足水族馆特定需求的实用方法。它首先探讨了两种应用最广泛的自动编码器方法(多层感知器 (MLP) 和卷积神经网络 (CNN))之间的区别,以确定最合适的方法。本文得出结论,虽然这两种方法(通过适当的配置和图像预处理)都可以降低从水生视频片段中收集的低对比度图像的维数并减少视觉噪声,但 CNN 方法更适合水族馆的架构。作为实验的意外发现,本文还发现,通过操纵 MLP 方法的公式,自动编码器可以生成具有更清晰和更理想视觉信息的去噪差分图像,这对水族馆的运行非常有利。最后,本文发现,在应用自动编码器之前进行适当的图像预处理可以导致更好的模型收敛和预测结果,这在实验中无论是从视觉上还是从数值上都得到了证明。本文得出结论,通过结合 MLP 的去噪效果、CNN 管理内存消耗的能力以及适当的图像预处理,可以很好地满足水族馆的具体实际需求。