Golabi Faegheh, Mehdizadeh Aghdam Elnaz, Shamsi Mousa, Sedaaghi Mohammad Hossein, Barzegar Abolfazl, Hejazi Mohammad Saeid
Genomic Signal Processing Laboratory, Faculty of Biomedical Engineering, Sahand University of Technology, Tabriz, Iran.
Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran.
Bioimpacts. 2021;11(2):101-109. doi: 10.34172/bi.2021.17. Epub 2020 Apr 17.
Riboswitches are short regulatory elements generally found in the untranslated regions of prokaryotes' mRNAs and classified into several families. Due to the binding possibility between riboswitches and antibiotics, their usage as engineered regulatory elements and also their evolutionary contribution, the need for bioinformatics tools of riboswitch detection is increasing. We have previously introduced an alignment independent algorithm for the identification of frequent sequential blocks in the families of riboswitches. Herein, we report the application of block location-based feature extraction strategy (BLBFE), which uses the locations of detected blocks on riboswitch sequences as features for classification of seed sequences. Besides, mono- and dinucleotide frequencies, k-mer, DAC, DCC, DACC, PC-PseDNC-General and SC-PseDNC-General methods as some feature extraction strategies were investigated. The classifiers of the Decision tree, KNN, LDA, and Naïve Bayes, as well as k-fold cross-validation, were employed for all methods of feature extraction to compare their performances based on the criteria of accuracy, sensitivity, specificity, and f-score performance measures. The outcome of the study showed that the BLBFE strategy classified the riboswitches indicating 87.65% average correct classification rate (CCR). Moreover, the performance of the proposed feature extraction method was confirmed with average values of 94.31%, 85.01%, 95.45% and 85.38% for accuracy, sensitivity, specificity, and f-score, respectively. Our result approved the performance of the BLBFE strategy in the classification and discrimination of the riboswitch groups showing remarkable higher values of CCR, accuracy, sensitivity, specificity and f-score relative to previously studied feature extraction methods.
核糖开关是短调节元件,通常存在于原核生物mRNA的非翻译区,并分为几个家族。由于核糖开关与抗生素之间存在结合可能性、其作为工程调节元件的用途及其进化贡献,对核糖开关检测的生物信息学工具的需求正在增加。我们之前介绍了一种用于识别核糖开关家族中频繁序列块的独立比对算法。在此,我们报告了基于块位置的特征提取策略(BLBFE)的应用,该策略将检测到的块在核糖开关序列上的位置用作种子序列分类的特征。此外,还研究了单核苷酸和二核苷酸频率、k-mer、DAC、DCC、DACC、PC-PseDNC-General和SC-PseDNC-General方法等一些特征提取策略。决策树、KNN、LDA和朴素贝叶斯分类器以及k折交叉验证被用于所有特征提取方法,以根据准确率、灵敏度、特异性和F分数性能指标比较它们的性能。研究结果表明,BLBFE策略对核糖开关进行分类,平均正确分类率(CCR)为87.65%。此外,所提出的特征提取方法的性能得到了证实,准确率、灵敏度、特异性和F分数的平均值分别为94.31%、85.01%、95.45%和85.38%。我们的结果证实了BLBFE策略在核糖开关组分类和区分中的性能,相对于之前研究的特征提取方法,其CCR、准确率、灵敏度、特异性和F分数的值显著更高。