Department of Psychology, School of Humanities & Social Science, University of Science & Technology of China, Hefei, Anhui, 230026, China.
Department of Radiology, the First Affiliated Hospital of USTC, School of Life Science, Division of Life Science and Medicine, University of Science & Technology of China, Hefei, 230027, China.
Adv Sci (Weinh). 2023 Apr;10(12):e2203990. doi: 10.1002/advs.202203990. Epub 2023 Feb 7.
Natural language processing (NLP) is central to the communication with machines and among ourselves, and NLP research field has long sought to produce human-quality language. Identification of informative criteria for measuring NLP-produced language quality will support development of ever-better NLP tools. The authors hypothesize that mentalizing network neural activity may be used to distinguish NLP-produced language from human-produced language, even for cases where human judges cannot subjectively distinguish the language source. Using the social chatbots Google Meena in English and Microsoft XiaoIce in Chinese to generate NLP-produced language, behavioral tests which reveal that variance of personality perceived from chatbot chats is larger than for human chats are conducted, suggesting that chatbot language usage patterns are not stable. Using an identity rating task with functional magnetic resonance imaging, neuroimaging analyses which reveal distinct patterns of brain activity in the mentalizing network including the DMPFC and rTPJ in response to chatbot versus human chats that cannot be distinguished subjectively are conducted. This study illustrates a promising empirical basis for measuring the quality of NLP-produced language: adding a judge's implicit perception as an additional criterion.
自然语言处理(NLP)是人机交流和人际交流的核心,NLP 研究领域长期以来一直致力于生成具有人类质量的语言。确定用于衡量 NLP 生成语言质量的信息标准将支持不断开发更好的 NLP 工具。作者假设,心理化网络的神经活动可用于区分 NLP 生成的语言和人类生成的语言,即使对于人类评判者无法主观区分语言来源的情况也是如此。使用英语的社交聊天机器人 Google Meena 和中文的 Microsoft XiaoIce 生成 NLP 生成的语言,进行行为测试,揭示从聊天机器人聊天中感知到的人格方差大于人类聊天,这表明聊天机器人的语言使用模式不稳定。使用身份评定任务和功能磁共振成像,进行神经影像学分析,揭示了在心理化网络中包括 dmPFC 和 rTPJ 的大脑活动的独特模式,这些活动对聊天机器人和人类聊天的反应无法主观区分。这项研究说明了衡量 NLP 生成语言质量的有希望的经验基础:增加评判者的内隐感知作为附加标准。