新的病毒基因组表示方法在深度学习 SARS-CoV-2 分类中的应用。

New proposal of viral genome representation applied in the classification of SARS-CoV-2 with deep learning.

机构信息

Laboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal, RN, 59078-970, Brazil.

Department of Pharmacy and Pharmaceutical Technology, University of Granada, Granada, Spain.

出版信息

BMC Bioinformatics. 2023 Mar 11;24(1):92. doi: 10.1186/s12859-023-05188-1.

DOI:10.1186/s12859-023-05188-1

PMID:36906520

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10007673/

Abstract

BACKGROUND

In December 2019, the first case of COVID-19 was described in Wuhan, China, and by July 2022, there were already 540 million confirmed cases. Due to the rapid spread of the virus, the scientific community has made efforts to develop techniques for the viral classification of SARS-CoV-2.

RESULTS

In this context, we developed a new proposal for gene sequence representation with Genomic Signal Processing techniques for the work presented in this paper. First, we applied the mapping approach to samples of six viral species of the Coronaviridae family, which belongs SARS-CoV-2 Virus. We then used the sequence downsized obtained by the method proposed in a deep learning architecture for viral classification, achieving an accuracy of 98.35%, 99.08%, and 99.69% for the 64, 128, and 256 sizes of the viral signatures, respectively, and obtaining 99.95% precision for the vectors with size 256.

CONCLUSIONS

The classification results obtained, in comparison to the results produced using other state-of-the-art representation techniques, demonstrate that the proposed mapping can provide a satisfactory performance result with low computational memory and processing time costs.

摘要

背景

2019 年 12 月，中国武汉首次描述了 COVID-19 病例，到 2022 年 7 月，已确诊病例已达 5.4 亿例。由于病毒的迅速传播，科学界已经努力开发 SARS-CoV-2 的病毒分类技术。

结果

在这种情况下，我们针对本文提出了一种新的基因序列表示方法，该方法结合了基因组信号处理技术。首先，我们将该方法应用于冠状病毒科的六种病毒物种的样本，这些样本属于 SARS-CoV-2 病毒。然后，我们使用通过所提出的方法获得的序列缩减后的方法在深度学习架构中进行病毒分类，分别获得了 64、128 和 256 大小的病毒特征的 98.35%、99.08%和 99.69%的准确性，并且获得了大小为 256 的向量的 99.95%的精度。

结论

与使用其他最先进的表示技术相比，所获得的分类结果表明，所提出的映射可以提供令人满意的性能结果，同时具有低计算内存和处理时间成本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9124/10007789/74ed9fd7e0af/12859_2023_5188_Fig1_HTML.jpg

相似文献

New proposal of viral genome representation applied in the classification of SARS-CoV-2 with deep learning.

BMC Bioinformatics. 2023 Mar 11;24(1):92. doi: 10.1186/s12859-023-05188-1.

Deepvirusclassifier: a deep learning tool for classifying SARS-CoV-2 based on viral subtypes within the coronaviridae family.

BMC Bioinformatics. 2024 Jul 5;25(1):231. doi: 10.1186/s12859-024-05754-1.

Assessment and classification of COVID-19 DNA sequence using pairwise features concatenation from multi-transformer and deep features with machine learning models.

SLAS Technol. 2024 Aug;29(4):100147. doi: 10.1016/j.slast.2024.100147. Epub 2024 May 23.

A Novel Protein Mapping Method for Predicting the Protein Interactions in COVID-19 Disease by Deep Learning.

Interdiscip Sci. 2021 Mar;13(1):44-60. doi: 10.1007/s12539-020-00405-4. Epub 2021 Jan 12.

Comparative Genomics Reveals Early Emergence and Biased Spatiotemporal Distribution of SARS-CoV-2.

Mol Biol Evol. 2021 May 19;38(6):2547-2565. doi: 10.1093/molbev/msab049.

Classification of SARS-CoV-2 viral genome sequences using Neurochaos Learning.

Med Biol Eng Comput. 2022 Aug;60(8):2245-2255. doi: 10.1007/s11517-022-02591-3. Epub 2022 Jun 7.

Utilizing genomic signatures to gain insights into the dynamics of SARS-CoV-2 through Machine and Deep Learning techniques.

BMC Bioinformatics. 2024 Mar 27;25(1):131. doi: 10.1186/s12859-024-05648-2.

Accurate and fast clade assignment via deep learning and frequency chaos game representation.

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giac119.

SARS-CoV-2 virus classification based on stacked sparse autoencoder.

Comput Struct Biotechnol J. 2023;21:284-298. doi: 10.1016/j.csbj.2022.12.007. Epub 2022 Dec 9.

Convolutional Neural Network Applied to SARS-CoV-2 Sequence Classification.

Sensors (Basel). 2022 Jul 31;22(15):5730. doi: 10.3390/s22155730.

引用本文的文献

PRCFX-DT: a new graph-based approach for feature selection and classification of genomic sequences.

BMC Bioinformatics. 2025 Jun 17;26(1):159. doi: 10.1186/s12859-025-06183-4.

Multifractal analysis and support vector machine for the classification of coronaviruses and SARS-CoV-2 variants.

Sci Rep. 2025 Apr 29;15(1):15041. doi: 10.1038/s41598-025-98366-5.

Deepvirusclassifier: a deep learning tool for classifying SARS-CoV-2 based on viral subtypes within the coronaviridae family.

BMC Bioinformatics. 2024 Jul 5;25(1):231. doi: 10.1186/s12859-024-05754-1.

Deep learning guided prediction modeling of dengue virus evolving serotype.

Heliyon. 2024 May 29;10(11):e32061. doi: 10.1016/j.heliyon.2024.e32061. eCollection 2024 Jun 15.

Analysis of Emerging Variants of Turkey Reovirus using Machine Learning.

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae224.

本文引用的文献

Classification of SARS-CoV-2 and non-SARS-CoV-2 using machine learning algorithms.

Comput Biol Med. 2021 Sep;136:104650. doi: 10.1016/j.compbiomed.2021.104650. Epub 2021 Jul 21.

Classification and specific primer design for accurate detection of SARS-CoV-2 using deep learning.

Sci Rep. 2021 Jan 13;11(1):947. doi: 10.1038/s41598-020-80363-5.

Laboratory Diagnosis and Monitoring the Viral Shedding of SARS-CoV-2 Infection.

Innovation (Camb). 2020 Nov 25;1(3):100061. doi: 10.1016/j.xinn.2020.100061. Epub 2020 Nov 4.

CatBoost for big data: an interdisciplinary review.

J Big Data. 2020;7(1):94. doi: 10.1186/s40537-020-00369-8. Epub 2020 Nov 4.

A diagnostic genomic signal processing (GSP)-based system for automatic feature analysis and detection of COVID-19.

Brief Bioinform. 2021 Mar 22;22(2):1197-1205. doi: 10.1093/bib/bbaa170.

Chaos game representation dataset of SARS-CoV-2 genome.

Data Brief. 2020 Apr 25;30:105618. doi: 10.1016/j.dib.2020.105618. eCollection 2020 Jun.

Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study.

PLoS One. 2020 Apr 24;15(4):e0232391. doi: 10.1371/journal.pone.0232391. eCollection 2020.

COVID-19 pandemic: perspectives on an unfolding crisis.

Br J Surg. 2020 Jun;107(7):785-787. doi: 10.1002/bjs.11627. Epub 2020 Mar 23.

Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding.

Lancet. 2020 Feb 22;395(10224):565-574. doi: 10.1016/S0140-6736(20)30251-8. Epub 2020 Jan 30.

Alignment-Free Sequence Analysis and Applications.

Annu Rev Biomed Data Sci. 2018 Jul;1:93-114. doi: 10.1146/annurev-biodatasci-080917-013431. Epub 2018 Apr 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

新的病毒基因组表示方法在深度学习 SARS-CoV-2 分类中的应用。

New proposal of viral genome representation applied in the classification of SARS-CoV-2 with deep learning.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献