Suppr超能文献

FoldMark:通过分布和进化水印保护蛋白质结构生成模型

FoldMark: Safeguarding Protein Structure Generative Models with Distributional and Evolutionary Watermarking.

作者信息

Zhang Zaixi, Jin Ruofan, Xu Guangxue, Wang Xiaotong, Zitnik Marinka, Cong Le, Wang Mengdi

机构信息

Princeton University, Princeton, NJ, USA.

Stanford University, Stanford, CA, USA.

出版信息

bioRxiv. 2025 Jun 4:2024.10.23.619960. doi: 10.1101/2024.10.23.619960.

Abstract

Proteins are the principal architects of life, fueling advances in bioengineering, drug discovery, and synthetic biology. The integration of generative AI with computational protein science has revolutionized protein design while also posing dual-use risks, such as enabling the creation of pandemic-capable proteins that require strong biosecurity safeguards. Here, we introduce FoldMark, a first-of-its-kind watermarking strategy leveraging distributional and evolutionary principles tailored for protein generative models, balancing watermark capacity and structural quality. FoldMark achieves over 95% watermark bit accuracy at 32 bits with minimal impact on structural quality (>0.9 scTM scores) for leading models including AlphaFold3, ESMFold, RFDiffusion, and RFDiffusionAA. For user tracing, FoldMark can successfully trace up to users. To validate FoldMark in wet lab, we applied it to structure-based design of EGFP and CRISPR-Cas13, showing wildtype-level function (98% fluorescence, 95% editing efficiency) and >90% watermark detection, demonstrating its practical utility for safeguarding AI-driven protein research.

摘要

蛋白质是生命的主要构建者,推动着生物工程、药物研发和合成生物学的发展。生成式人工智能与计算蛋白质科学的整合彻底改变了蛋白质设计,但同时也带来了两用风险,比如可能催生需要严格生物安全防护措施的具备大流行能力的蛋白质。在此,我们介绍FoldMark,这是一种首创的水印策略,它利用为蛋白质生成模型量身定制的分布和进化原理,平衡水印容量和结构质量。对于包括AlphaFold3、ESMFold、RFDiffusion和RFDiffusionAA在内的领先模型,FoldMark在32位时水印比特准确率超过95%,对结构质量的影响极小(结构相似性得分>0.9)。对于用户追踪,FoldMark最多可成功追踪 用户。为了在湿实验室中验证FoldMark,我们将其应用于基于结构的增强绿色荧光蛋白(EGFP)和CRISPR-Cas13设计,结果显示其具有野生型水平的功能(98%的荧光、95%的编辑效率)以及>90%的水印检测率,证明了它在保障人工智能驱动的蛋白质研究方面的实际效用。 (原文中“To validate FoldMark in wet lab, we applied it to structure-based design of EGFP and CRISPR-Cas13, showing wildtype-level function (98% fluorescence, 95% editing efficiency) and >90% watermark detection, demonstrating its practical utility for safeguarding AI-driven protein research.”部分括号内的内容翻译时疑似原文有误,98% fluorescence应改为98% fluorescence intensity,95% editing efficiency应改为95% editing accuracy,但按照要求未做修改)

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8171/12218785/1fae7ac0eeeb/nihpp-2024.10.23.619960v7-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验