Department of Biology, School of Basic Sciences and Bioinformatics Research Group, University of Qom, Qom, Iran.
PLoS One. 2012;7(9):e44164. doi: 10.1371/journal.pone.0044164. Epub 2012 Sep 5.
Various methods have been used to identify cultivares of olive trees; herein we used different bioinformatics algorithms to propose new tools to classify 10 cultivares of olive based on RAPD and ISSR genetic markers datasets generated from PCR reactions. Five RAPD markers (OPA0a21, OPD16a, OP01a1, OPD16a1 and OPA0a8) and five ISSR markers (UBC841a4, UBC868a7, UBC841a14, U12BC807a and UBC810a13) selected as the most important markers by all attribute weighting models. K-Medoids unsupervised clustering run on SVM dataset was fully able to cluster each olive cultivar to the right classes. All trees (176) induced by decision tree models generated meaningful trees and UBC841a4 attribute clearly distinguished between foreign and domestic olive cultivars with 100% accuracy. Predictive machine learning algorithms (SVM and Naïve Bayes) were also able to predict the right class of olive cultivares with 100% accuracy. For the first time, our results showed data mining techniques can be effectively used to distinguish between plant cultivares and proposed machine learning based systems in this study can predict new olive cultivars with the best possible accuracy.
已经使用了各种方法来鉴定橄榄树的品种;在这里,我们使用了不同的生物信息学算法,提出了新的工具,根据从 PCR 反应生成的 RAPD 和 ISSR 遗传标记数据集对 10 个橄榄品种进行分类。选择了 5 个 RAPD 标记(OPA0a21、OPD16a、OP01a1、OPD16a1 和 OPA0a8)和 5 个 ISSR 标记(UBC841a4、UBC868a7、UBC841a14、U12BC807a 和 UBC810a13)作为所有属性加权模型中最重要的标记。在 SVM 数据集上运行的 K-Medoids 无监督聚类完全能够将每个橄榄品种聚类到正确的类别中。决策树模型生成的所有树(176 棵)都生成了有意义的树,UBC841a4 属性能够以 100%的准确率清楚地区分外来和国内的橄榄品种。预测机器学习算法(SVM 和朴素贝叶斯)也能够以 100%的准确率预测橄榄品种的正确类别。这是首次表明,数据挖掘技术可有效地用于区分植物品种,本研究提出的基于机器学习的系统可以以最佳的准确率预测新的橄榄品种。