Department of CSE, School of Technology, Pandit Deendayal Energy University, Gandhinagar, Gujarat, India.
Department of Computer Science and Engineering, Punjabi University, Patiala, Punjab, India.
Sci Rep. 2024 Mar 19;14(1):6589. doi: 10.1038/s41598-024-57077-z.
Identifying and recognizing the food on the basis of its eating sounds is a challenging task, as it plays an important role in avoiding allergic foods, providing dietary preferences to people who are restricted to a particular diet, showcasing its cultural significance, etc. In this research paper, the aim is to design a novel methodology that helps to identify food items by analyzing their eating sounds using various deep learning models. To achieve this objective, a system has been proposed that extracts meaningful features from food-eating sounds with the help of signal processing techniques and deep learning models for classifying them into their respective food classes. Initially, 1200 audio files for 20 food items labeled have been collected and visualized to find relationships between the sound files of different food items. Later, to extract meaningful features, various techniques such as spectrograms, spectral rolloff, spectral bandwidth, and mel-frequency cepstral coefficients are used for the cleaning of audio files as well as to capture the unique characteristics of different food items. In the next phase, various deep learning models like GRU, LSTM, InceptionResNetV2, and the customized CNN model have been trained to learn spectral and temporal patterns in audio signals. Besides this, the models have also been hybridized i.e. Bidirectional LSTM + GRU and RNN + Bidirectional LSTM, and RNN + Bidirectional GRU to analyze their performance for the same labeled data in order to associate particular patterns of sound with their corresponding class of food item. During evaluation, the highest accuracy, precision,F1 score, and recall have been obtained by GRU with 99.28%, Bidirectional LSTM + GRU with 97.7% as well as 97.3%, and RNN + Bidirectional LSTM with 97.45%, respectively. The results of this study demonstrate that deep learning models have the potential to precisely identify foods on the basis of their sound by computing the best outcomes.
基于进食声音识别和辨认食物是一项具有挑战性的任务,因为它在避免过敏食物、为特定饮食受限的人提供饮食偏好、展示其文化意义等方面发挥着重要作用。在本研究论文中,目的是设计一种新的方法,通过使用各种深度学习模型分析食物的进食声音来帮助识别食物。为了实现这一目标,提出了一种系统,该系统借助信号处理技术和深度学习模型从食物进食声音中提取有意义的特征,将其分类到各自的食物类别中。最初,收集并可视化了 1200 个带有 20 种食物标签的音频文件,以发现不同食物声音文件之间的关系。之后,为了提取有意义的特征,使用各种技术(如声谱图、谱滚降、谱带宽和梅尔频率倒谱系数)来清理音频文件并捕捉不同食物的独特特征。在下一阶段,训练了各种深度学习模型,如 GRU、LSTM、InceptionResNetV2 和定制的 CNN 模型,以学习音频信号中的光谱和时间模式。此外,还对模型进行了杂交,即双向 LSTM+GRU 和 RNN+双向 LSTM,以及 RNN+双向 GRU,以便分析相同标记数据的性能,以便将特定的声音模式与相应的食物类别相关联。在评估过程中,GRU 获得了最高的准确性、精度、F1 分数和召回率,分别为 99.28%、双向 LSTM+GRU 为 97.7%和 97.3%、RNN+双向 LSTM 为 97.45%。这项研究的结果表明,深度学习模型通过计算最佳结果,有可能基于声音精确识别食物。