Suppr超能文献

用神经网络破解遗传密码。

Cracking the genetic code with neural networks.

作者信息

Joiret Marc, Leclercq Marine, Lambrechts Gaspard, Rapino Francesca, Close Pierre, Louppe Gilles, Geris Liesbet

机构信息

Biomechanics Research Unit, GIGA in Silico Medicine, Liège University, Liège, Belgium.

Cancer Signaling, GIGA Stem Cells, Liège University, Liège, Belgium.

出版信息

Front Artif Intell. 2023 Apr 6;6:1128153. doi: 10.3389/frai.2023.1128153. eCollection 2023.

Abstract

The genetic code is textbook scientific knowledge that was soundly established without resorting to Artificial Intelligence (AI). The goal of our study was to check whether a neural network could re-discover, on its own, the mapping links between codons and amino acids and build the complete deciphering dictionary upon presentation of transcripts proteins data training pairs. We compared different Deep Learning neural network architectures and estimated quantitatively the size of the required human transcriptomic training set to achieve the best possible accuracy in the codon-to-amino-acid mapping. We also investigated the effect of a codon embedding layer assessing the semantic similarity between codons on the rate of increase of the training accuracy. We further investigated the benefit of quantifying and using the unbalanced representations of amino acids within real human proteins for a faster deciphering of rare amino acids codons. Deep neural networks require huge amount of data to train them. Deciphering the genetic code by a neural network is no exception. A test accuracy of 100% and the unequivocal deciphering of rare codons such as the tryptophan codon or the stop codons require a training dataset of the order of 4-22 millions cumulated pairs of codons with their associated amino acids presented to the neural network over around 7-40 training epochs, depending on the architecture and settings. We confirm that the wide generic capacities and modularity of deep neural networks allow them to be customized easily to learn the deciphering task of the genetic code efficiently.

摘要

遗传密码是教科书式的科学知识,它在未借助人工智能(AI)的情况下就已被牢固确立。我们研究的目的是检验神经网络是否能够自行重新发现密码子与氨基酸之间的映射关系,并在呈现转录本-蛋白质数据训练对时构建完整的解密字典。我们比较了不同的深度学习神经网络架构,并定量估计了所需人类转录组训练集的规模,以在密码子到氨基酸的映射中实现尽可能高的准确率。我们还研究了密码子嵌入层对密码子之间语义相似性的评估对训练准确率提升速率的影响。我们进一步研究了量化并利用真实人类蛋白质中氨基酸的不平衡表示来更快解密稀有氨基酸密码子的益处。深度神经网络需要大量数据来进行训练。通过神经网络解密遗传密码也不例外。要达到100%的测试准确率并明确解密色氨酸密码子或终止密码子等稀有密码子,需要一个大约400万至2200万对累积的密码子及其相关氨基酸的训练数据集,并在大约7至40个训练轮次中呈现给神经网络,具体取决于架构和设置。我们证实,深度神经网络广泛的通用能力和模块化特性使其能够轻松定制,从而有效地学习遗传密码的解密任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f250/10117997/0cd490a24d3d/frai-06-1128153-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验