Liang Jiuxing, Cui Zifeng, Wu Canbiao, Yu Yao, Tian Rui, Xie Hongxian, Jin Zhuang, Fan Weiwen, Xie Weiling, Huang Zhaoyue, Xu Wei, Zhu Jingjing, You Zeshan, Guo Xiaofang, Qiu Xiaofan, Ye Jiahao, Lang Bin, Li Mengyuan, Tan Songwei, Hu Zheng
Key Laboratory of Brain, Cognition and Education Sciences, Ministry of Education, Guangzhou, China.
Institute for Brain Research and Rehabilitation, South China Normal University, Guangzhou 510631, China.
Bioinformatics. 2021 Oct 25;37(20):3405-3411. doi: 10.1093/bioinformatics/btab388.
Epstein-Barr virus (EBV) is one of the most prevalent DNA oncogenic viruses. The integration of EBV into the host genome has been reported to play an important role in cancer development. The preference of EBV integration showed strong dependence on the local genomic environment, which enables the prediction of EBV integration sites.
An attention-based deep learning model, DeepEBV, was developed to predict EBV integration sites by learning local genomic features automatically. First, DeepEBV was trained and tested using the data from the dsVIS database. The results showed that DeepEBV with EBV integration sequences plus Repeat peaks and 2-fold data augmentation performed the best on the training dataset. Furthermore, the performance of the model was validated in an independent dataset. In addition, the motifs of DNA-binding proteins could influence the selection preference of viral insertional mutagenesis. Furthermore, the results showed that DeepEBV can predict EBV integration hotspot genes accurately. In summary, DeepEBV is a robust, accurate and explainable deep learning model, providing novel insights into EBV integration preferences and mechanisms.
DeepEBV is available as open-source software and can be downloaded from https://github.com/JiuxingLiang/DeepEBV.git.
Supplementary data are available at Bioinformatics online.
爱泼斯坦-巴尔病毒(EBV)是最普遍的DNA致癌病毒之一。据报道,EBV整合到宿主基因组中在癌症发展中起重要作用。EBV整合的偏好显示出对局部基因组环境的强烈依赖性,这使得能够预测EBV整合位点。
开发了一种基于注意力的深度学习模型DeepEBV,通过自动学习局部基因组特征来预测EBV整合位点。首先,使用来自dsVIS数据库的数据对DeepEBV进行训练和测试。结果表明,带有EBV整合序列加上重复峰和2倍数据增强的DeepEBV在训练数据集上表现最佳。此外,该模型的性能在一个独立数据集中得到了验证。另外,DNA结合蛋白的基序可以影响病毒插入诱变的选择偏好。此外,结果表明DeepEBV可以准确预测EBV整合热点基因。总之,DeepEBV是一个强大、准确且可解释的深度学习模型,为EBV整合偏好和机制提供了新的见解。
DeepEBV作为开源软件可用,可从https://github.com/JiuxingLiang/DeepEBV.git下载。
补充数据可在《生物信息学》在线获取。