Alt. Technology Labs, Inc., Berkeley, CA, USA.
Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
Nat Commun. 2020 Dec 8;11(1):6293. doi: 10.1038/s41467-020-19612-0.
The promise of biotechnology is tempered by its potential for accidental or deliberate misuse. Reliably identifying telltale signatures characteristic to different genetic designers, termed 'genetic engineering attribution', would deter misuse, yet is still considered unsolved. Here, we show that recurrent neural networks trained on DNA motifs and basic phenotype data can reach 70% attribution accuracy in distinguishing between over 1,300 labs. To make these models usable in practice, we introduce a framework for weighing predictions against other investigative evidence using calibration, and bring our model to within 1.6% of perfect calibration. Additionally, we demonstrate that simple models can accurately predict both the nation-state-of-origin and ancestor labs, forming the foundation of an integrated attribution toolkit which should promote responsible innovation and international security alike.
生物技术的前景因可能被意外或故意滥用而受到限制。可靠地识别出不同基因设计师特有的明显特征,称为“基因工程归因”,可以阻止滥用,但仍被认为尚未解决。在这里,我们展示了在 DNA 基序和基本表型数据上训练的递归神经网络可以达到 70%的区分准确率,可区分 1300 多个实验室。为了使这些模型在实践中可用,我们引入了一个框架,用于使用校准来权衡预测与其他调查证据,我们的模型达到了 1.6%的完美校准。此外,我们还证明,简单的模型可以准确预测国家/地区来源和祖先实验室,为综合归因工具包奠定基础,这应该可以促进负责任的创新和国际安全。