Miao Yan, Bian Jilong, Dong Guanghui, Dai Tianhong
College of Computer and Control Engineering, Northeast Forestry University, Harbin, China.
Front Microbiol. 2023 Jun 16;14:1169791. doi: 10.3389/fmicb.2023.1169791. eCollection 2023.
A metagenome contains all DNA sequences from an environmental sample, including viruses, bacteria, archaea, and eukaryotes. Since viruses are of huge abundance and have caused vast mortality and morbidity to human society in history as a type of major pathogens, detecting viruses from metagenomes plays a crucial role in analyzing the viral component of samples and is the very first step for clinical diagnosis. However, detecting viral fragments directly from the metagenomes is still a tough issue because of the existence of a huge number of short sequences. In this study a hybrid Deep lEarning model for idenTifying vIral sequences fRom mEtagenomes (DETIRE) is proposed to solve the problem. First, the graph-based nucleotide sequence embedding strategy is utilized to enrich the expression of DNA sequences by training an embedding matrix. Then, the spatial and sequential features are extracted by trained CNN and BiLSTM networks, respectively, to enrich the features of short sequences. Finally, the two sets of features are weighted combined for the final decision. Trained by 220,000 sequences of 500 bp subsampled from the Virus and Host RefSeq genomes, DETIRE identifies more short viral sequences (<1,000 bp) than the three latest methods, such as DeepVirFinder, PPR-Meta, and CHEER. DETIRE is freely available at Github (https://github.com/crazyinter/DETIRE).
宏基因组包含来自环境样本的所有DNA序列,包括病毒、细菌、古菌和真核生物。由于病毒数量众多,并且在历史上作为一类主要病原体给人类社会带来了巨大的死亡率和发病率,从宏基因组中检测病毒在分析样本的病毒成分中起着关键作用,也是临床诊断的第一步。然而,由于存在大量短序列,直接从宏基因组中检测病毒片段仍然是一个难题。在本研究中,提出了一种用于从宏基因组中识别病毒序列的混合深度学习模型(DETIRE)来解决该问题。首先,利用基于图的核苷酸序列嵌入策略,通过训练嵌入矩阵来丰富DNA序列的表达。然后,分别通过训练好的卷积神经网络(CNN)和双向长短期记忆网络(BiLSTM)提取空间和序列特征,以丰富短序列的特征。最后,对两组特征进行加权组合以做出最终决策。通过从病毒和宿主RefSeq基因组中抽取的220,000个500 bp的序列进行训练,DETIRE比DeepVirFinder、PPR-Meta和CHEER这三种最新方法识别出更多的短病毒序列(<1000 bp)。DETIRE可在Github(https://github.com/crazyinter/DETIRE)上免费获取。