Suppr超能文献

迈向使用变分自编码器设计的人工智能基因组。

Towards AI-designed genomes using a variational autoencoder.

作者信息

Dudek Natasha K, Precup Doina

机构信息

School of Computer Science, McGill University, Montreal, QC H3A 0G4, Canada.

Mila-Québec Artificial Intelligence Institute, Montreal, QC H2S 3H1, Canada.

出版信息

Proc Biol Sci. 2024 Dec;291(2036):20241457. doi: 10.1098/rspb.2024.1457. Epub 2024 Dec 11.

Abstract

Genomes encode elaborate networks of genes whose products must seamlessly interact to support living organisms. Humans' capacity to understand these biological systems is limited by their sheer size and complexity. In this article, we develop a proof of concept framework for training a machine learning (ML) algorithm to model bacterial genome composition. To achieve this, we create simplified representations of genomes in the form of binary vectors that indicate the encoded genes, henceforth referred to as genome vectors. A denoising variational autoencoder was trained to accept corrupted genome vectors, in which most genes had been masked, and reconstruct the original. The resulting model, DeepGenomeVector, effectively captures complex dependencies in genomic networks, as evaluated by both qualitative and quantitative metrics. An in-depth functional analysis of a generated genome vector shows that its encoded pathways are interconnected, near complete, and ecologically cohesive. On the test set, where the model's ability to reconstruct uncorrupted genome vectors was evaluated, Area Under the Receiver Operating Curve (AUROC) and F1 scores of 0.98 and 0.83, respectively, support the model's strong performance. This article showcases the power of ML approaches for synthetic biology and highlights the possibility that artifical intelligence agents may one day be able to design genomes that animate carbon-based cells.

摘要

基因组编码了复杂的基因网络,其产物必须无缝相互作用以支持生物体。人类理解这些生物系统的能力受到其规模和复杂性的限制。在本文中,我们开发了一个概念验证框架,用于训练机器学习(ML)算法来模拟细菌基因组组成。为了实现这一点,我们以二进制向量的形式创建了基因组的简化表示,这些向量指示编码的基因,此后称为基因组向量。训练了一个去噪变分自编码器,以接受大多数基因已被掩盖的损坏基因组向量,并重建原始向量。通过定性和定量指标评估,所得模型DeepGenomeVector有效地捕获了基因组网络中的复杂依赖性。对生成的基因组向量进行的深入功能分析表明,其编码的途径相互连接、近乎完整且具有生态凝聚力。在测试集上,评估了模型重建未损坏基因组向量的能力,受试者工作特征曲线下面积(AUROC)和F1分数分别为0.98和0.83,支持了模型的强大性能。本文展示了ML方法在合成生物学中的力量,并强调了人工智能代理有朝一日可能能够设计出使碳基细胞有生命的基因组的可能性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3133/11631412/659d9da22c21/rspb.2024.1457.f001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验