He Bifang, Huang Jian, Chen Heng
Medical College, Guizhou University, Jiaxiu Road, Huaxi Zone, Guiyang 550025, P. R. China.
Center for Informational Biology, University of Electronic Science and Technology of China, No. 2006, Xiyuan Ave, West Hi-Tech Zone, Chengdu 611731, P. R. China.
J Bioinform Comput Biol. 2019 Dec;17(6):1950039. doi: 10.1142/S0219720019500392.
Plant exclusive virus-derived small interfering RNAs (vsiRNAs) regulate various biological processes, especially important in antiviral immunity. The identification of plant vsiRNAs is important for understanding the biogenesis and function mechanisms of vsiRNAs and further developing anti-viral plants. In this study, we extracted plant vsiRNA sequences from the PVsiRNAdb database. We then utilized deep convolutional neural network (CNN) to develop a deep learning algorithm for predicting plant vsiRNAs based on vsiRNA sequence composition, known as PVsiRNAPred. The key part of PVsiRNAPred is the CNN module, which automatically learns hierarchical representations of vsiRNA sequences related to vsiRNA profiles in plants. When evaluated using an independent testing dataset, the accuracy of the model was 65.70%, which was higher than those of five conventional machine learning method-based classifiers. In addition, PVsiRNAPred obtained a sensitivity of 67.11%, specificity of 64.26% and Matthews correlation coefficient (MCC) of 0.31, and the area under the receiver operating characteristic (ROC) curve (AUC) of PVsiRNAPred was 0.71 in the independent test. The permutation test with 1000 shuffles resulted in a value . The above results reveal that PVsiRNAPred has favorable generalization capabilities. We hope PVsiRNAPred, the first bioinformatics algorithm for predicting plant vsiRNAs, will allow efficient discovery of new vsiRNAs.
植物特有的病毒衍生小干扰RNA(vsiRNAs)调节各种生物学过程,在抗病毒免疫中尤为重要。鉴定植物vsiRNAs对于理解vsiRNAs的生物合成和功能机制以及进一步培育抗病毒植物至关重要。在本研究中,我们从PVsiRNAdb数据库中提取了植物vsiRNA序列。然后,我们利用深度卷积神经网络(CNN)开发了一种基于vsiRNA序列组成预测植物vsiRNAs的深度学习算法,称为PVsiRNAPred。PVsiRNAPred的关键部分是CNN模块,它自动学习与植物中vsiRNA谱相关的vsiRNA序列的分层表示。使用独立测试数据集进行评估时,该模型的准确率为65.70%,高于基于五种传统机器学习方法的分类器。此外,PVsiRNAPred在独立测试中的灵敏度为67.11%,特异性为64.26%,马修斯相关系数(MCC)为0.31,其受试者操作特征(ROC)曲线下面积(AUC)为0.71。1000次洗牌的排列检验得出一个 值。上述结果表明PVsiRNAPred具有良好的泛化能力。我们希望PVsiRNAPred,这一首个预测植物vsiRNAs的生物信息学算法,将有助于高效发现新的vsiRNAs。