Suppr超能文献

严重急性呼吸综合征冠状病毒2(SARS-CoV-2)基因组的混沌游戏表示数据集。

Chaos game representation dataset of SARS-CoV-2 genome.

作者信息

Barbosa Raquel de M, Fernandes Marcelo A C

机构信息

MIT Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA.

Laboratory of Machine Learning and Intelligent Instrumentation, IMD/nPITI, Federal University of Rio Grande do Norte, Natal 59078-970, Brazil.

出版信息

Data Brief. 2020 Apr 25;30:105618. doi: 10.1016/j.dib.2020.105618. eCollection 2020 Jun.

Abstract

As of April 16, 2020, the novel coronavirus disease (called COVID-19) spread to more than 185 countries/regions with more than 142,000 deaths and more than 2,000,000 confirmed cases. In the bioinformatics area, one of the crucial points is the analysis of the virus nucleotide sequences using approaches such as data stream, digital signal processing, and machine learning techniques and algorithms. However, to make feasible this approach, it is necessary to transform the nucleotide sequences string to numerical values representation. Thus, the dataset provides a chaos game representation (CGR) of SARS-CoV-2 virus nucleotide sequences. The dataset provides the CGR of 100 instances of SARS-CoV-2 virus, 11540 instances of other viruses from the Virus-Host DB dataset, and three instances of Riboviria viruses from NCBI (Betacoronavirus RaTG13, bat-SL-CoVZC45, and bat-SL-CoVZXC21).

摘要

截至2020年4月16日,新型冠状病毒病(称为COVID-19)已传播至185多个国家/地区,死亡人数超过14.2万,确诊病例超过200万。在生物信息学领域,关键点之一是使用诸如数据流、数字信号处理以及机器学习技术和算法等方法对病毒核苷酸序列进行分析。然而,为使这种方法可行,有必要将核苷酸序列字符串转换为数值表示形式。因此,该数据集提供了严重急性呼吸综合征冠状病毒2(SARS-CoV-2)病毒核苷酸序列的混沌博弈表示(CGR)。该数据集提供了100个SARS-CoV-2病毒实例的CGR、来自病毒-宿主数据库(Virus-Host DB)数据集的11540个其他病毒实例,以及来自美国国立医学图书馆(NCBI)的三个核糖病毒实例(乙型冠状病毒RaTG13、蝙蝠严重急性呼吸综合征相关冠状病毒ZC45和蝙蝠严重急性呼吸综合征相关冠状病毒ZXC21)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e00/7236614/836091e47af7/gr1.jpg

相似文献

1
Chaos game representation dataset of SARS-CoV-2 genome.
Data Brief. 2020 Apr 25;30:105618. doi: 10.1016/j.dib.2020.105618. eCollection 2020 Jun.
2
Data stream dataset of SARS-CoV-2 genome.
Data Brief. 2020 Jun 10;31:105829. doi: 10.1016/j.dib.2020.105829. eCollection 2020 Aug.
3
RCOVID19: Recurrence-based SARS-CoV-2 features using chaos game representation.
Data Brief. 2020 Oct;32:106144. doi: 10.1016/j.dib.2020.106144. Epub 2020 Aug 7.
4
A comparative study on structural proteins of viruses that belong to the identical family.
Eur Phys J Spec Top. 2023 Feb 17:1-10. doi: 10.1140/epjs/s11734-023-00791-y.
5
Fast Phylogeny of SARS-CoV-2 by Compression.
Entropy (Basel). 2022 Mar 22;24(4):439. doi: 10.3390/e24040439.
6
Similarity Studies of Corona Viruses through Chaos Game Representation.
Comput Mol Biosci. 2020 Sep;10(3):61-72. doi: 10.4236/cmb.2020.103004.
7
Using Chaos-Game-Representation for Analysing the SARS-CoV-2 Lineages, Newly Emerging Strains and Recombinants.
Curr Genomics. 2023 Nov 22;24(3):187-195. doi: 10.2174/0113892029264990231013112156.
8
A novel numerical representation for proteins: Three-dimensional Chaos Game Representation and its Extended Natural Vector.
Comput Struct Biotechnol J. 2020 Jul 15;18:1904-1913. doi: 10.1016/j.csbj.2020.07.004. eCollection 2020.
9
Numerical encoding of DNA sequences by chaos game representation with application in similarity comparison.
Genomics. 2016 Oct;108(3-4):134-142. doi: 10.1016/j.ygeno.2016.08.002. Epub 2016 Aug 15.
10
Multifarious aspects of the chaos game representation and its applications in biological sequence analysis.
Comput Biol Med. 2022 Dec;151(Pt A):106243. doi: 10.1016/j.compbiomed.2022.106243. Epub 2022 Oct 25.

引用本文的文献

1
New proposal of viral genome representation applied in the classification of SARS-CoV-2 with deep learning.
BMC Bioinformatics. 2023 Mar 11;24(1):92. doi: 10.1186/s12859-023-05188-1.
2
Convolutional Neural Network Applied to SARS-CoV-2 Sequence Classification.
Sensors (Basel). 2022 Jul 31;22(15):5730. doi: 10.3390/s22155730.
3
NGS data vectorization, clustering, and finding key codons in SARS-CoV-2 variations.
BMC Bioinformatics. 2022 May 17;23(1):187. doi: 10.1186/s12859-022-04718-7.
4
Early survey with bibliometric analysis on machine learning approaches in controlling COVID-19 outbreaks.
PeerJ Comput Sci. 2020 Nov 23;6:e313. doi: 10.7717/peerj-cs.313. eCollection 2020.
7
Data stream dataset of SARS-CoV-2 genome.
Data Brief. 2020 Jun 10;31:105829. doi: 10.1016/j.dib.2020.105829. eCollection 2020 Aug.

本文引用的文献

1
Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study.
PLoS One. 2020 Apr 24;15(4):e0232391. doi: 10.1371/journal.pone.0232391. eCollection 2020.
2
Encoding and Decoding DNA Sequences by Integer Chaos Game Representation.
J Comput Biol. 2019 Feb;26(2):143-151. doi: 10.1089/cmb.2018.0173. Epub 2018 Dec 5.
3
Numerical encoding of DNA sequences by chaos game representation with application in similarity comparison.
Genomics. 2016 Oct;108(3-4):134-142. doi: 10.1016/j.ygeno.2016.08.002. Epub 2016 Aug 15.
4
Linking Virus Genomes with Host Taxonomy.
Viruses. 2016 Mar 1;8(3):66. doi: 10.3390/v8030066.
5
Chaos game representation of gene structure.
Nucleic Acids Res. 1990 Apr 25;18(8):2163-70. doi: 10.1093/nar/18.8.2163.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验