Chaudhary Nikhil, Sharma Ashok K, Agarwal Piyush, Gupta Ankit, Sharma Vineet K
MetaInformatics Laboratory, Metagenomics and Systems Biology Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Madhya Pradesh, India.
MetaInformatics Laboratory, Metagenomics and Systems Biology Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Madhya Pradesh, India; Department of Physics, Indian Institute of Science Education and Research Bhopal, Madhya Pradesh, India.
PLoS One. 2015 Feb 3;10(2):e0116106. doi: 10.1371/journal.pone.0116106. eCollection 2015.
The diversity of microbial species in a metagenomic study is commonly assessed using 16S rRNA gene sequencing. With the rapid developments in genome sequencing technologies, the focus has shifted towards the sequencing of hypervariable regions of 16S rRNA gene instead of full length gene sequencing. Therefore, 16S Classifier is developed using a machine learning method, Random Forest, for faster and accurate taxonomic classification of short hypervariable regions of 16S rRNA sequence. It displayed precision values of up to 0.91 on training datasets and the precision values of up to 0.98 on the test dataset. On real metagenomic datasets, it showed up to 99.7% accuracy at the phylum level and up to 99.0% accuracy at the genus level. 16S Classifier is available freely at http://metagenomics.iiserb.ac.in/16Sclassifier and http://metabiosys.iiserb.ac.in/16Sclassifier.
在宏基因组学研究中,微生物物种的多样性通常使用16S rRNA基因测序进行评估。随着基因组测序技术的快速发展,重点已转向16S rRNA基因高变区的测序,而非全长基因测序。因此,开发了16S分类器,它使用机器学习方法随机森林,对16S rRNA序列的短高变区进行更快、更准确的分类。它在训练数据集上的精确值高达0.91,在测试数据集上的精确值高达0.98。在真实的宏基因组数据集上,它在门水平的准确率高达99.7%,在属水平的准确率高达99.0%。16S分类器可在http://metagenomics.iiserb.ac.in/16Sclassifier和http://metabiosys.iiserb.ac.in/16Sclassifier免费获取。