Suppr超能文献

sgRNA CNN:使用卷积神经网络集合鉴定四种作物中的 sgRNA 靶标活性。

sgRNACNN: identifying sgRNA on-target activity in four crops using ensembles of convolutional neural networks.

机构信息

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.

Department of System Integration, Sparebanken Vest, Bergen, Norway.

出版信息

Plant Mol Biol. 2021 Mar;105(4-5):483-495. doi: 10.1007/s11103-020-01102-y. Epub 2021 Jan 1.

Abstract

We proposed an ensemble convolutional neural network model to identify sgRNA high on-target activity in four crops and we used one-hot encoding and k-mers for sequence encoding. As an important component of the CRISPR/Cas9 system, single-guide RNA (sgRNA) plays an important role in gene redirection and editing. sgRNA has played an important role in the improvement of agronomic species, but there is a lack of effective bioinformatics tools to identify the activity of sgRNA in agronomic species. Therefore, it is necessary to develop a method based on machine learning to identify sgRNA high on-target activity. In this work, we proposed a simple convolutional neural network method to identify sgRNA high on-target activity. Our study used one-hot encoding and k-mers for sequence data conversion and a voting algorithm for constructing the convolutional neural network ensemble model sgRNACNN for the prediction of sgRNA activity. The ensemble model sgRNACNN was used for predictions in four crops: Glycine max, Zea mays, Sorghum bicolor and Triticum aestivum. The accuracy rates of the four crops in the sgRNACNN model were 82.43%, 80.33%, 78.25% and 87.49%, respectively. The experimental results showed that sgRNACNN realizes the identification of high on-target activity sgRNA of agronomic data and can meet the demands of sgRNA activity prediction in agronomy to a certain extent. These results have certain significance for guiding crop gene editing and academic research. The source code and relevant dataset can be found in the following link: https://github.com/nmt315320/sgRNACNN.git .

摘要

我们提出了一个集成卷积神经网络模型,用于识别四种作物中 sgRNA 的高靶标活性,我们使用独热编码和 K-mer 进行序列编码。作为 CRISPR/Cas9 系统的重要组成部分,单导向 RNA(sgRNA)在基因重定向和编辑中发挥着重要作用。sgRNA 在改良农作物品种方面发挥了重要作用,但缺乏有效的生物信息学工具来识别农作物中的 sgRNA 活性。因此,有必要开发一种基于机器学习的方法来识别 sgRNA 的高靶标活性。在这项工作中,我们提出了一种简单的卷积神经网络方法来识别 sgRNA 的高靶标活性。我们的研究使用独热编码和 K-mer 进行序列数据转换,并使用投票算法构建卷积神经网络集成模型 sgRNACNN 用于 sgRNA 活性预测。集成模型 sgRNACNN 用于四种作物的预测:大豆、玉米、高粱和小麦。sgRNACNN 模型在四种作物中的准确率分别为 82.43%、80.33%、78.25%和 87.49%。实验结果表明,sgRNACNN 实现了对农业数据中高靶标活性 sgRNA 的识别,在一定程度上能够满足农业中 sgRNA 活性预测的需求。这些结果对于指导作物基因编辑和学术研究具有一定的意义。源代码和相关数据集可在以下链接中找到:https://github.com/nmt315320/sgRNACNN.git。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验