Suppr超能文献

人工智能科学家面临的风险:将保障置于自主性之上的优先考量。

Risks of AI scientists: prioritizing safeguarding over autonomy.

作者信息

Tang Xiangru, Jin Qiao, Zhu Kunlun, Yuan Tongxin, Zhang Yichi, Zhou Wangchunshu, Qu Meng, Zhao Yilun, Tang Jian, Zhang Zhuosheng, Cohan Arman, Greenbaum Dov, Lu Zhiyong, Gerstein Mark

机构信息

Department of Computer Science, Yale University, New Haven, CT, USA.

Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.

出版信息

Nat Commun. 2025 Sep 18;16(1):8317. doi: 10.1038/s41467-025-63913-1.

Abstract

AI scientists powered by large language models have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines. While their capabilities are promising, these agents also introduce novel vulnerabilities that require careful consideration for safety. However, there has been limited comprehensive exploration of these vulnerabilities. This perspective examines vulnerabilities in AI scientists, shedding light on potential risks associated with their misuse, and emphasizing the need for safety measures. We begin by providing an overview of the potential risks inherent to AI scientists, taking into account user intent, the specific scientific domain, and their potential impact on the external environment. Then, we explore the underlying causes of these vulnerabilities and provide a scoping review of the limited existing works. Based on our analysis, we propose a triadic framework involving human regulation, agent alignment, and an understanding of environmental feedback (agent regulation) to mitigate these identified risks. Furthermore, we highlight the limitations and challenges associated with safeguarding AI scientists and advocate for the development of improved models, robust benchmarks, and comprehensive regulations.

摘要

由大语言模型驱动的人工智能科学家在自主开展实验和推动各学科的科学发现方面展现出了巨大潜力。尽管它们的能力很有前景,但这些智能体也带来了新的漏洞,需要对安全性予以谨慎考量。然而,对这些漏洞的全面探索还很有限。本文观点审视了人工智能科学家存在的漏洞,揭示了与它们被滥用相关的潜在风险,并强调了安全措施的必要性。我们首先概述人工智能科学家固有的潜在风险,同时考虑用户意图、特定科学领域以及它们对外部环境的潜在影响。然后,我们探究这些漏洞的根本原因,并对现有的有限相关研究进行范围界定综述。基于我们的分析,我们提出了一个三元框架,包括人类监管、智能体对齐以及对环境反馈的理解(智能体监管),以减轻这些已识别的风险。此外,我们强调了保障人工智能科学家安全所面临的限制和挑战,并倡导开发改进的模型、强大的基准和全面的法规。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验