Liu Xueyan, Zhang Hongyan, Zeng Ying, Zhu Xinghui, Zhu Lei, Fu Jiahui
College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China.
School of Computer and Communication, Hunan Institute of Engineering, Xiangtan 411104, China.
Genes (Basel). 2024 Mar 26;15(4):404. doi: 10.3390/genes15040404.
The precise identification of splice sites is essential for unraveling the structure and function of genes, constituting a pivotal step in the gene annotation process. In this study, we developed a novel deep learning model, DRANetSplicer, that integrates residual learning and attention mechanisms for enhanced accuracy in capturing the intricate features of splice sites. We constructed multiple datasets using the most recent versions of genomic data from three different organisms, , and . This approach allows us to train models with a richer set of high-quality data. DRANetSplicer outperformed benchmark methods on donor and acceptor splice site datasets, achieving an average accuracy of (96.57%, 95.82%) across the three organisms. Comparative analyses with benchmark methods, including SpliceFinder, Splice2Deep, Deep Splicer, EnsembleSplice, and DNABERT, revealed DRANetSplicer's superior predictive performance, resulting in at least a (4.2%, 11.6%) relative reduction in average error rate. We utilized the DRANetSplicer model trained on data to predict splice sites in , achieving accuracies for donor and acceptor sites of (94.89%, 94.25%). These results indicate that DRANetSplicer possesses excellent cross-organism predictive capabilities, with its performance in cross-organism predictions even surpassing that of benchmark methods in non-cross-organism predictions. Cross-organism validation showcased DRANetSplicer's excellence in predicting splice sites across similar organisms, supporting its applicability in gene annotation for understudied organisms. We employed multiple methods to visualize the decision-making process of the model. The visualization results indicate that DRANetSplicer can learn and interpret well-known biological features, further validating its overall performance. Our study systematically examined and confirmed the predictive ability of DRANetSplicer from various levels and perspectives, indicating that its practical application in gene annotation is justified.
剪接位点的精确识别对于揭示基因的结构和功能至关重要,是基因注释过程中的关键步骤。在本研究中,我们开发了一种新型深度学习模型DRANetSplicer,它集成了残差学习和注意力机制,以提高捕捉剪接位点复杂特征的准确性。我们使用来自三种不同生物体(此处原文缺失具体生物体名称)的最新版本基因组数据构建了多个数据集。这种方法使我们能够用更丰富的高质量数据训练模型。DRANetSplicer在供体和受体剪接位点数据集上的表现优于基准方法,在这三种生物体中平均准确率达到(96.57%,95.82%)。与包括SpliceFinder、Splice2Deep、Deep Splicer、EnsembleSplice和DNABERT在内的基准方法进行的比较分析表明,DRANetSplicer具有卓越的预测性能,平均错误率至少相对降低了(4.2%,11.6%)。我们利用在(此处原文缺失具体数据名称)数据上训练的DRANetSplicer模型来预测(此处原文缺失具体生物体名称)中的剪接位点,供体和受体位点的准确率分别为(94.89%,94.25%)。这些结果表明DRANetSplicer具有出色的跨生物体预测能力,其在跨生物体预测中的性能甚至超过了基准方法在非跨生物体预测中的性能。跨生物体验证展示了DRANetSplicer在预测相似生物体的剪接位点方面的卓越能力,支持了其在未充分研究生物体的基因注释中的适用性。我们采用多种方法来可视化模型的决策过程。可视化结果表明DRANetSplicer能够学习和解释知名的生物学特征,进一步验证了其整体性能。我们的研究从多个层面和角度系统地检验并证实了DRANetSplicer的预测能力,表明其在基因注释中的实际应用是合理的。