用于遗传工程归因的机器学习工具包，以促进生物安保。

A machine learning toolkit for genetic engineering attribution to facilitate biosecurity.

机构信息

Alt. Technology Labs, Inc., Berkeley, CA, USA.

Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.

出版信息

Nat Commun. 2020 Dec 8;11(1):6293. doi: 10.1038/s41467-020-19612-0.

DOI:10.1038/s41467-020-19612-0

PMID:33293535

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7722865/

Abstract

The promise of biotechnology is tempered by its potential for accidental or deliberate misuse. Reliably identifying telltale signatures characteristic to different genetic designers, termed 'genetic engineering attribution', would deter misuse, yet is still considered unsolved. Here, we show that recurrent neural networks trained on DNA motifs and basic phenotype data can reach 70% attribution accuracy in distinguishing between over 1,300 labs. To make these models usable in practice, we introduce a framework for weighing predictions against other investigative evidence using calibration, and bring our model to within 1.6% of perfect calibration. Additionally, we demonstrate that simple models can accurately predict both the nation-state-of-origin and ancestor labs, forming the foundation of an integrated attribution toolkit which should promote responsible innovation and international security alike.

摘要

生物技术的前景因可能被意外或故意滥用而受到限制。可靠地识别出不同基因设计师特有的明显特征，称为“基因工程归因”，可以阻止滥用，但仍被认为尚未解决。在这里，我们展示了在 DNA 基序和基本表型数据上训练的递归神经网络可以达到 70%的区分准确率，可区分 1300 多个实验室。为了使这些模型在实践中可用，我们引入了一个框架，用于使用校准来权衡预测与其他调查证据，我们的模型达到了 1.6%的完美校准。此外，我们还证明，简单的模型可以准确预测国家/地区来源和祖先实验室，为综合归因工具包奠定基础，这应该可以促进负责任的创新和国际安全。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f09/7722865/58d9778c4a11/41467_2020_19612_Fig1_HTML.jpg

相似文献

A machine learning toolkit for genetic engineering attribution to facilitate biosecurity.用于遗传工程归因的机器学习工具包，以促进生物安保。

Nat Commun. 2020 Dec 8;11(1):6293. doi: 10.1038/s41467-020-19612-0.

The biosecurity benefits of genetic engineering attribution.遗传工程归因的生物安保效益。

Nat Commun. 2020 Dec 8;11(1):6294. doi: 10.1038/s41467-020-19149-2.

Biosecurity. Synthetic biologists debate policing themselves.生物安全。合成生物学家就是否自我监管展开辩论。

Science. 2006 May 26;312(5777):1116. doi: 10.1126/science.312.5777.1116.

How machine learning could keep dangerous DNA out of terrorists' hands.机器学习如何能防止危险DNA落入恐怖分子手中。

Nature. 2019 Feb;566(7742):19. doi: 10.1038/d41586-019-00277-9.

Analysis of the first genetic engineering attribution challenge.首例基因工程归因挑战分析。

Nat Commun. 2022 Nov 30;13(1):7374. doi: 10.1038/s41467-022-35032-8.

DNA synthesis and biological security.DNA合成与生物安全。

Nat Biotechnol. 2007 Jun;25(6):627-9. doi: 10.1038/nbt0607-627.

Pathways to security.通往安全的途径。

Nature. 2008 Sep 25;455(7212):432. doi: 10.1038/455432a.

Improving lab-of-origin prediction of genetically engineered plasmids via deep metric learning.通过深度度量学习改进基因工程质粒的来源实验室预测

Nat Comput Sci. 2022 Apr;2(4):253-264. doi: 10.1038/s43588-022-00234-z. Epub 2022 Apr 28.

US report pins down future biosecurity.美国报告确定未来生物安全方向。

Nature. 2010 Aug 5;466(7307):678. doi: 10.1038/466678a.

Synthetic biologists face up to security issues.合成生物学家直面安全问题。

Nature. 2005 Aug 18;436(7053):894-5. doi: 10.1038/436894a.

引用本文的文献

Exo-Tox: Identifying Exotoxins from secreted bacterial proteins.外毒素：从分泌的细菌蛋白中鉴定外毒素

BioData Min. 2025 Aug 8;18(1):52. doi: 10.1186/s13040-025-00469-2.

Responsible AI in biotechnology: balancing discovery, innovation and biosecurity risks.生物技术中的负责任人工智能：平衡发现、创新与生物安全风险

Front Bioeng Biotechnol. 2025 Feb 5;13:1537471. doi: 10.3389/fbioe.2025.1537471. eCollection 2025.

The need for adaptability in detection, characterization, and attribution of biosecurity threats.在生物安全威胁的检测、特征描述和归因方面对适应性的需求。

Nat Commun. 2024 Dec 19;15(1):10699. doi: 10.1038/s41467-024-55436-y.

Synsor: a tool for alignment-free detection of engineered DNA sequences.Synsor：一种用于无比对检测工程化DNA序列的工具。

Front Bioeng Biotechnol. 2024 Jul 12;12:1375626. doi: 10.3389/fbioe.2024.1375626. eCollection 2024.

Identifying widespread and recurrent variants of genetic parts to improve annotation of engineered DNA sequences.鉴定遗传元件的广泛和反复出现的变体，以改进工程 DNA 序列的注释。

PLoS One. 2024 May 28;19(5):e0304164. doi: 10.1371/journal.pone.0304164. eCollection 2024.

Artificial intelligence challenges in the face of biological threats: emerging catastrophic risks for public health.面对生物威胁时人工智能的挑战：对公共卫生新出现的灾难性风险。

Front Artif Intell. 2024 May 10;7:1382356. doi: 10.3389/frai.2024.1382356. eCollection 2024.

Cryptographic approaches to authenticating synthetic DNA sequences.用于认证合成 DNA 序列的密码学方法。

Trends Biotechnol. 2024 Aug;42(8):1002-1016. doi: 10.1016/j.tibtech.2024.02.002. Epub 2024 Feb 27.

Analysis of the first genetic engineering attribution challenge.首例基因工程归因挑战分析。

Nat Commun. 2022 Nov 30;13(1):7374. doi: 10.1038/s41467-022-35032-8.

Editorial: Recent advances in plant genetic engineering and innovative applications.社论：植物基因工程的最新进展与创新应用

Front Plant Sci. 2022 Oct 19;13:1045417. doi: 10.3389/fpls.2022.1045417. eCollection 2022.

PlasmidHawk improves lab of origin prediction of engineered plasmids using sequence alignment.PlasmidHawk 通过序列比对提高了工程质粒的来源预测的实验室准确性。

Nat Commun. 2021 Feb 26;12(1):1167. doi: 10.1038/s41467-021-21180-w.

本文引用的文献

Development of a confinable gene drive system in the human disease vector .在人类疾病载体中开发一种可限制的基因驱动系统。

Elife. 2020 Jan 21;9:e51701. doi: 10.7554/eLife.51701.

Unified rational protein engineering with sequence-based deep representation learning.基于序列的深度学习表示的统一理性蛋白质工程。

Nat Methods. 2019 Dec;16(12):1315-1322. doi: 10.1038/s41592-019-0598-1. Epub 2019 Oct 21.

Population imaging of neural activity in awake behaving mice.在清醒活动的小鼠中进行神经活动的群体成像。

Nature. 2019 Oct;574(7778):413-417. doi: 10.1038/s41586-019-1641-1. Epub 2019 Oct 9.

The Kipoi repository accelerates community exchange and reuse of predictive models for genomics.Kipoi库加速了基因组学预测模型的社区交流与重用。

Nat Biotechnol. 2019 Jun;37(6):592-600. doi: 10.1038/s41587-019-0140-0.

CHOPCHOP v3: expanding the CRISPR web toolbox beyond genome editing.CHOPCHOP v3：扩展 CRISPR 网络工具包，超越基因组编辑。

Nucleic Acids Res. 2019 Jul 2;47(W1):W171-W174. doi: 10.1093/nar/gkz365.

Next Steps for Access to Safe, Secure DNA Synthesis.获取安全、可靠的DNA合成的后续步骤。

Front Bioeng Biotechnol. 2019 Apr 24;7:86. doi: 10.3389/fbioe.2019.00086. eCollection 2019.

End-to-End Differentiable Learning of Protein Structure.端到端可微分蛋白质结构学习

Cell Syst. 2019 Apr 24;8(4):292-301.e3. doi: 10.1016/j.cels.2019.03.006. Epub 2019 Apr 17.

Deep generative models of genetic variation capture the effects of mutations.深度生成模型捕获遗传变异的突变效应。

Nat Methods. 2018 Oct;15(10):816-822. doi: 10.1038/s41592-018-0138-4. Epub 2018 Sep 24.

Deep learning to predict the lab-of-origin of engineered DNA.深度学习预测工程 DNA 的起源实验室。

Nat Commun. 2018 Aug 7;9(1):3135. doi: 10.1038/s41467-018-05378-z.

DNA sequencing at 40: past, present and future.DNA 测序 40 年：过去、现在与未来。

Nature. 2017 Oct 19;550(7676):345-353. doi: 10.1038/nature24286. Epub 2017 Oct 11.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于遗传工程归因的机器学习工具包，以促进生物安保。

A machine learning toolkit for genetic engineering attribution to facilitate biosecurity.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献