Qu Youzhi, Wei Chen, Du Penghui, Che Wenxin, Zhang Chi, Ouyang Wanli, Bian Yatao, Xu Feiyang, Hu Bin, Du Kai, Wu Haiyan, Liu Jia, Liu Quanying
Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen 518055, China.
Shanghai AI Laboratory, Shanghai 200232, China.
iScience. 2024 Mar 22;27(4):109550. doi: 10.1016/j.isci.2024.109550. eCollection 2024 Apr 19.
During the evolution of large models, performance evaluation is necessary for assessing their capabilities. However, current model evaluations mainly rely on specific tasks and datasets, lacking a united framework for assessing the multidimensional intelligence of large models. In this perspective, we advocate for a comprehensive framework of cognitive science-inspired artificial general intelligence (AGI) tests, including crystallized, fluid, social, and embodied intelligence. The AGI tests consist of well-designed cognitive tests adopted from human intelligence tests, and then naturally encapsulates into an immersive virtual community. We propose increasing the complexity of AGI testing tasks commensurate with advancements in large models and emphasizing the necessity for the interpretation of test results to avoid false negatives and false positives. We believe that cognitive science-inspired AGI tests will effectively guide the targeted improvement of large models in specific dimensions of intelligence and accelerate the integration of large models into human society.
在大模型的发展过程中,性能评估对于评估其能力是必要的。然而,当前的模型评估主要依赖于特定任务和数据集,缺乏一个用于评估大模型多维智能的统一框架。从这个角度来看,我们提倡一个受认知科学启发的通用人工智能(AGI)测试的综合框架,包括晶体智力、流体智力、社会智力和具身智力。AGI测试由从人类智力测试中采用的精心设计的认知测试组成,然后自然地封装到一个沉浸式虚拟社区中。我们建议随着大模型的进步增加AGI测试任务的复杂性,并强调对测试结果进行解释的必要性,以避免假阴性和假阳性。我们相信,受认知科学启发的AGI测试将有效地指导大模型在特定智力维度上有针对性地改进,并加速大模型融入人类社会。