Suppr超能文献

-QR分类器:一种基于模式的植物物种识别方法。

-QR classifier: a patterns based approach for plant species identification.

作者信息

More Ravi Prabhakar, Mane Rupali Chandrashekhar, Purohit Hemant J

机构信息

Environmental Genomics Division, CSIR-National Environmental Engineering Research Institute, Nagpur, 440020 Maharashtra India ; Present Institute: Division of Molecular Entomology, ICAR- National Bureau of Agricultural Insect Resources (NBAIR), Hebbal, Bengaluru, 560024 Karnataka India.

MDS Bio-Analytics, Nagpur, 440020 Maharashtra India.

出版信息

BioData Min. 2016 Dec 9;9:39. doi: 10.1186/s13040-016-0120-6. eCollection 2016.

Abstract

BACKGROUND

DNA barcoding is widely used and most efficient approach that facilitates rapid and accurate identification of plant species based on the short standardized segment of the genome. The nucleotide sequences of maturaseK () and ribulose-1, 5-bisphosphate carboxylase () marker loci are commonly used in plant species identification. Here, we present a new and highly efficient approach for identifying a unique set of discriminating nucleotide patterns to generate a signature (i.e. regular expression) for plant species identification.

METHODS

In order to generate molecular signatures, we used and loci datasets, which encompass 125 plant species in 52 genera reported by the CBOL plant working group. Initially, we performed Multiple Sequence Alignment (MSA) of all species followed by Position Specific Scoring Matrix (PSSM) for both loci to achieve a percentage of discrimination among species. Further, we detected Discriminating Patterns (DP) at genus and species level using PSSM for the dataset. Combining DP and consecutive pattern distances, we generated molecular signatures for each species. Finally, we performed a comparative assessment of these signatures with the existing methods including BLASTn, Support Vector Machines (SVM), Jrip-RIPPER, J48 (C4.5 algorithm), and the Naïve Bayes (NB) methods against NCBI-GenBank dataset.

RESULTS

Due to the higher discrimination success obtained with the as compared to the , we selected gene for signature generation. We generated signatures for 60 species based on identified discriminating patterns at genus and species level. Our comparative assessment results suggest that a total of 46 out of 60 species could be correctly identified using generated signatures, followed by BLASTn (34 species), SVM (18 species), C4.5 (7 species), NB (4 species) and RIPPER (3 species) methods As a final outcome of this study, we converted signatures into QR codes and developed a software -QR Classifier (http://www.neeri.res.in/matk_classifier/index.htm), which search signatures in the query gene sequences and predict corresponding plant species.

CONCLUSIONS

This novel approach of employing pattern-based signatures opens new avenues for the classification of species. In addition to existing methods, we believe that -QR Classifier would be a valuable tool for molecular taxonomists enabling precise identification of plant species.

摘要

背景

DNA条形码技术是一种广泛应用且高效的方法,它基于基因组的短标准化片段,有助于快速准确地鉴定植物物种。成熟酶K(matK)和核酮糖-1,5-二磷酸羧化酶(rbcL)标记基因座的核苷酸序列常用于植物物种鉴定。在此,我们提出了一种全新且高效的方法,用于识别一组独特的区分性核苷酸模式,以生成用于植物物种鉴定的特征码(即正则表达式)。

方法

为了生成分子特征码,我们使用了matK和rbcL基因座数据集,这些数据集涵盖了CBOL植物工作组报告的52个属中的125种植物。最初,我们对所有物种进行了多序列比对(MSA),随后对两个基因座进行了位置特异性评分矩阵(PSSM)分析,以实现物种间的区分率。此外,我们使用PSSM对matK数据集在属和种水平上检测区分模式(DP)。结合DP和连续模式距离,我们为每个物种生成了分子特征码。最后,我们将这些特征码与包括BLASTn、支持向量机(SVM)、Jrip - RIPPER、J48(C4.5算法)和朴素贝叶斯(NB)方法在内的现有方法针对NCBI - GenBank matK数据集进行了比较评估。

结果

由于matK相比rbcL获得了更高的区分成功率,我们选择matK基因进行特征码生成。我们基于在属和种水平上确定的区分模式为60个物种生成了特征码。我们的比较评估结果表明,使用生成的特征码能够正确鉴定60个物种中的46个,其次是BLASTn(34个物种)方法、SVM(18个物种)方法、C4.5(7个物种)方法、NB(4个物种)方法和RIPPER(3个物种)方法。作为本研究的最终成果,我们将特征码转换为二维码,并开发了一款软件 - QR分类器(http://www.neeri.res.in/matk_classifier/index.htm),该软件可在查询的matK基因序列中搜索特征码并预测相应的植物物种。

结论

这种采用基于模式的特征码的新方法为物种分类开辟了新途径。除了现有方法外,我们认为QR分类器将成为分子分类学家精确鉴定植物物种的有价值工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2ab/5148893/a11bbf169f65/13040_2016_120_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验