Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.
Department of System Integration, Sparebanken Vest, Bergen, Norway.
Plant Mol Biol. 2021 Mar;105(4-5):483-495. doi: 10.1007/s11103-020-01102-y. Epub 2021 Jan 1.
We proposed an ensemble convolutional neural network model to identify sgRNA high on-target activity in four crops and we used one-hot encoding and k-mers for sequence encoding. As an important component of the CRISPR/Cas9 system, single-guide RNA (sgRNA) plays an important role in gene redirection and editing. sgRNA has played an important role in the improvement of agronomic species, but there is a lack of effective bioinformatics tools to identify the activity of sgRNA in agronomic species. Therefore, it is necessary to develop a method based on machine learning to identify sgRNA high on-target activity. In this work, we proposed a simple convolutional neural network method to identify sgRNA high on-target activity. Our study used one-hot encoding and k-mers for sequence data conversion and a voting algorithm for constructing the convolutional neural network ensemble model sgRNACNN for the prediction of sgRNA activity. The ensemble model sgRNACNN was used for predictions in four crops: Glycine max, Zea mays, Sorghum bicolor and Triticum aestivum. The accuracy rates of the four crops in the sgRNACNN model were 82.43%, 80.33%, 78.25% and 87.49%, respectively. The experimental results showed that sgRNACNN realizes the identification of high on-target activity sgRNA of agronomic data and can meet the demands of sgRNA activity prediction in agronomy to a certain extent. These results have certain significance for guiding crop gene editing and academic research. The source code and relevant dataset can be found in the following link: https://github.com/nmt315320/sgRNACNN.git .
我们提出了一个集成卷积神经网络模型,用于识别四种作物中 sgRNA 的高靶标活性,我们使用独热编码和 K-mer 进行序列编码。作为 CRISPR/Cas9 系统的重要组成部分,单导向 RNA(sgRNA)在基因重定向和编辑中发挥着重要作用。sgRNA 在改良农作物品种方面发挥了重要作用,但缺乏有效的生物信息学工具来识别农作物中的 sgRNA 活性。因此,有必要开发一种基于机器学习的方法来识别 sgRNA 的高靶标活性。在这项工作中,我们提出了一种简单的卷积神经网络方法来识别 sgRNA 的高靶标活性。我们的研究使用独热编码和 K-mer 进行序列数据转换,并使用投票算法构建卷积神经网络集成模型 sgRNACNN 用于 sgRNA 活性预测。集成模型 sgRNACNN 用于四种作物的预测:大豆、玉米、高粱和小麦。sgRNACNN 模型在四种作物中的准确率分别为 82.43%、80.33%、78.25%和 87.49%。实验结果表明,sgRNACNN 实现了对农业数据中高靶标活性 sgRNA 的识别,在一定程度上能够满足农业中 sgRNA 活性预测的需求。这些结果对于指导作物基因编辑和学术研究具有一定的意义。源代码和相关数据集可在以下链接中找到:https://github.com/nmt315320/sgRNACNN.git。