Suppr超能文献

CSSSCL:一个使用组合序列相似性得分对长序列和短序列读数进行准确分类的Python软件包。

CSSSCL: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads.

作者信息

Borozan Ivan, Ferretti Vincent

机构信息

Informatics and Bio-computing, Ontario Institute for Cancer Research, MaRS Centre, 661 University Avenue, Suite 510, Toronto, Ontario, Canada.

出版信息

Bioinformatics. 2016 Feb 1;32(3):453-5. doi: 10.1093/bioinformatics/btv587. Epub 2015 Oct 9.

Abstract

SUMMARY

Sequence comparison of genetic material between known and unknown organisms plays a crucial role in genomics, metagenomics and phylogenetic analysis. The emerging long-read sequencing technologies can now produce reads of tens of kilobases in length that promise a more accurate assessment of their origin. To facilitate the classification of long and short DNA sequences, we have developed a Python package that implements a new sequence classification model that we have demonstrated to improve the classification accuracy when compared with other state of the art classification methods. For the purpose of validation, and to demonstrate its usefulness, we test the combined sequence similarity score classifier (CSSSCL) using three different datasets, including a metagenomic dataset composed of short reads.

AVAILABILITY AND IMPLEMENTATION

Package's source code and test datasets are available under the GPLv3 license at https://github.com/oicr-ibc/cssscl.

CONTACT

ivan.borozan@oicr.on.ca

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

摘要

已知生物与未知生物之间遗传物质的序列比较在基因组学、宏基因组学和系统发育分析中起着至关重要的作用。新兴的长读长测序技术现在能够产生长达数十千碱基的读段,有望对其来源进行更准确的评估。为便于对长、短DNA序列进行分类,我们开发了一个Python包,该包实现了一种新的序列分类模型,与其他现有分类方法相比,我们已证明该模型可提高分类准确性。为进行验证并证明其有用性,我们使用三个不同的数据集测试了组合序列相似性评分分类器(CSSSCL),其中包括一个由短读段组成的宏基因组数据集。

可用性与实现方式

该包的源代码和测试数据集可在https://github.com/oicr-ibc/cssscl上根据GPLv3许可获取。

联系方式

ivan.borozan@oicr.on.ca

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

2
Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification.
Bioinformatics. 2015 May 1;31(9):1396-404. doi: 10.1093/bioinformatics/btv006. Epub 2015 Jan 7.
3
MetaShot: an accurate workflow for taxon classification of host-associated microbiome from shotgun metagenomic data.
Bioinformatics. 2017 Jun 1;33(11):1730-1732. doi: 10.1093/bioinformatics/btx036.
6
MATAM: reconstruction of phylogenetic marker genes from short sequencing reads in metagenomes.
Bioinformatics. 2018 Feb 15;34(4):585-591. doi: 10.1093/bioinformatics/btx644.
7
A fast and robust protocol for metataxonomic analysis using RNAseq data.
Microbiome. 2017 Jan 19;5(1):7. doi: 10.1186/s40168-016-0219-5.
8
LVQ-KNN: Composition-based DNA/RNA binning of short nucleotide sequences utilizing a prototype-based k-nearest neighbor approach.
Virus Res. 2018 Oct 15;258:55-63. doi: 10.1016/j.virusres.2018.10.002. Epub 2018 Oct 4.
9
Deep learning models for bacteria taxonomic classification of metagenomic data.
BMC Bioinformatics. 2018 Jul 9;19(Suppl 7):198. doi: 10.1186/s12859-018-2182-6.
10
Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences.
BMC Genomics. 2011;12 Suppl 2(Suppl 2):S4. doi: 10.1186/1471-2164-12-S2-S4. Epub 2011 Jul 27.

引用本文的文献

1
Music of metagenomics-a review of its applications, analysis pipeline, and associated tools.
Funct Integr Genomics. 2022 Feb;22(1):3-26. doi: 10.1007/s10142-021-00810-y. Epub 2021 Oct 18.
2
The landscape of viral associations in human cancers.
Nat Genet. 2020 Mar;52(3):320-330. doi: 10.1038/s41588-019-0558-9. Epub 2020 Feb 5.
3
DisCVR: Rapid viral diagnosis from high-throughput sequencing data.
Virus Evol. 2019 Aug 26;5(2):vez033. doi: 10.1093/ve/vez033. eCollection 2019 Jul.

本文引用的文献

1
A complete bacterial genome assembled de novo using only nanopore sequencing data.
Nat Methods. 2015 Aug;12(8):733-5. doi: 10.1038/nmeth.3444. Epub 2015 Jun 15.
2
Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification.
Bioinformatics. 2015 May 1;31(9):1396-404. doi: 10.1093/bioinformatics/btv006. Epub 2015 Jan 7.
3
One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly.
Curr Opin Microbiol. 2015 Feb;23:110-20. doi: 10.1016/j.mib.2014.11.014. Epub 2014 Dec 1.
4
Kraken: ultrafast metagenomic sequence classification using exact alignments.
Genome Biol. 2014 Mar 3;15(3):R46. doi: 10.1186/gb-2014-15-3-r46.
5
A comparative evaluation of sequence classification programs.
BMC Bioinformatics. 2012 May 10;13:92. doi: 10.1186/1471-2105-13-92.
6
A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.
Bioinformatics. 2011 Mar 15;27(6):764-70. doi: 10.1093/bioinformatics/btr011. Epub 2011 Jan 7.
7
NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads.
Bioinformatics. 2011 Jan 1;27(1):127-9. doi: 10.1093/bioinformatics/btq619. Epub 2010 Nov 8.
8
Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models.
Nat Methods. 2009 Sep;6(9):673-6. doi: 10.1038/nmeth.1358. Epub 2009 Aug 2.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验