Suppr超能文献

深度学习预测工程 DNA 的起源实验室。

Deep learning to predict the lab-of-origin of engineered DNA.

机构信息

Synthetic Biology Center, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.

出版信息

Nat Commun. 2018 Aug 7;9(1):3135. doi: 10.1038/s41467-018-05378-z.

Abstract

Genetic engineering projects are rapidly growing in scale and complexity, driven by new tools to design and construct DNA. There is increasing concern that widened access to these technologies could lead to attempts to construct cells for malicious intent, illegal drug production, or to steal intellectual property. Determining the origin of a DNA sequence is difficult and time-consuming. Here deep learning is applied to predict the lab-of-origin of a DNA sequence. A convolutional neural network was trained on the Addgene plasmid dataset that contained 42,364 engineered DNA sequences from 2230 labs as of February 2016. The network correctly identifies the source lab 48% of the time and 70% it appears in the top 10 predicted labs. Often, there is not a single "smoking gun" that affiliates a DNA sequence with a lab. Rather, it is a combination of design choices that are individually common but collectively reveal the designer.

摘要

基因工程项目的规模和复杂性正在迅速扩大,这得益于设计和构建 DNA 的新工具。人们越来越担心,这些技术的使用范围扩大,可能会导致人们试图构建恶意意图的细胞、非法生产毒品或窃取知识产权。确定 DNA 序列的来源是困难且耗时的。在这里,深度学习被应用于预测 DNA 序列的来源实验室。一个卷积神经网络在 Addgene 质粒数据集上进行了训练,该数据集包含截至 2016 年 2 月来自 2230 个实验室的 42364 个工程化 DNA 序列。该网络正确识别来源实验室的时间为 48%,在排名前 10 的预测实验室中出现的时间为 70%。通常,没有一个单一的“确凿证据”可以将 DNA 序列与一个实验室联系起来。相反,是一系列设计选择的组合,这些选择单独来看很常见,但集体上却揭示了设计者。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb5/6081423/036adf56e0da/41467_2018_5378_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验