Yamagata Koichi, Kwon Jinhwan, Kawashima Takuya, Shimoda Wataru, Sakamoto Maki
Graduate School of Informatics and Engineering, The University of Electro Communications, Chofu, Japan.
Department of Education, Kyoto University of Education, Kyoto, Japan.
Front Psychol. 2021 Oct 7;12:654779. doi: 10.3389/fpsyg.2021.654779. eCollection 2021.
The major goals of texture research in computer vision are to understand, model, and process texture and ultimately simulate human visual information processing using computer technologies. The field of computer vision has witnessed remarkable advancements in material recognition using deep convolutional neural networks (DCNNs), which have enabled various computer vision applications, such as self-driving cars, facial and gesture recognition, and automatic number plate recognition. However, for computer vision to "express" texture like human beings is still difficult because texture description has no correct or incorrect answer and is ambiguous. In this paper, we develop a computer vision method using DCNN that expresses texture of materials. To achieve this goal, we focus on Japanese "sound-symbolic" words, which can describe differences in texture sensation at a fine resolution and are known to have strong and systematic sensory-sound associations. Because the phonemes of Japanese sound-symbolic words characterize categories of texture sensations, we develop a computer vision method to generate the phonemes and structure comprising sound-symbolic words that probabilistically correspond to the input images. It was confirmed that the sound-symbolic words output by our system had about 80% accuracy rate in our evaluation.
计算机视觉中纹理研究的主要目标是理解、建模和处理纹理,并最终利用计算机技术模拟人类视觉信息处理。计算机视觉领域在使用深度卷积神经网络(DCNN)进行材料识别方面取得了显著进展,这使得各种计算机视觉应用成为可能,如自动驾驶汽车、面部和手势识别以及自动车牌识别。然而,要让计算机视觉像人类一样“表达”纹理仍然很困难,因为纹理描述没有正确或错误之分,而且具有模糊性。在本文中,我们开发了一种使用DCNN的计算机视觉方法来表达材料的纹理。为了实现这一目标,我们关注日语中的“语音象征”词,这些词可以在高分辨率下描述纹理感觉的差异,并且已知具有强烈且系统的感官 - 声音关联。由于日语语音象征词的音素表征了纹理感觉的类别,我们开发了一种计算机视觉方法来生成音素以及包含与输入图像概率对应的语音象征词的结构。在我们的评估中,证实了我们系统输出的语音象征词准确率约为80%。