Kramer Robin S S
University of Lincoln, UK.
Perception. 2025 Jan;54(1):65-68. doi: 10.1177/03010066241295992. Epub 2024 Nov 5.
ChatGPT's large language model, GPT-4V, has been trained on vast numbers of image-text pairs and is therefore capable of processing visual input. This model operates very differently from current state-of-the-art neural networks designed specifically for face perception and so I chose to investigate whether ChatGPT could also be applied to this domain. With this aim, I focussed on the task of face matching, that is, deciding whether two photographs showed the same person or not. Across six different tests, ChatGPT demonstrated performance that was comparable with human accuracies despite being a domain-general 'virtual assistant' rather than a specialised tool for face processing. This perhaps surprising result identifies a new avenue for exploration in this field, while further research should explore the boundaries of ChatGPT's ability, along with how its errors may relate to those made by humans.
ChatGPT的大语言模型GPT-4V已在大量图像-文本对上进行了训练,因此能够处理视觉输入。该模型的运行方式与当前专门为面部感知设计的最先进神经网络有很大不同,所以我选择研究ChatGPT是否也能应用于这一领域。出于这个目的,我专注于面部匹配任务,即判断两张照片是否为同一个人。在六项不同的测试中,ChatGPT展现出了与人类准确率相当的表现,尽管它是一个通用的“虚拟助手”,而非专门用于面部处理的工具。这一或许令人惊讶的结果为该领域开辟了一条新的探索途径,而进一步的研究应探索ChatGPT能力的边界,以及其错误与人类错误之间的关联。