Meher Prabina Kumar, Sahu Tanmaya Kumar, Banchariya Anjali, Rao Atmakuri Ramakrishna
Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012, India.
Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012, India.
BMC Bioinformatics. 2017 Mar 24;18(1):190. doi: 10.1186/s12859-017-1587-y.
Insecticide resistance is a major challenge for the control program of insect pests in the fields of crop protection, human and animal health etc. Resistance to different insecticides is conferred by the proteins encoded from certain class of genes of the insects. To distinguish the insecticide resistant proteins from non-resistant proteins, no computational tool is available till date. Thus, development of such a computational tool will be helpful in predicting the insecticide resistant proteins, which can be targeted for developing appropriate insecticides.
Five different sets of feature viz., amino acid composition (AAC), di-peptide composition (DPC), pseudo amino acid composition (PAAC), composition-transition-distribution (CTD) and auto-correlation function (ACF) were used to map the protein sequences into numeric feature vectors. The encoded numeric vectors were then used as input in support vector machine (SVM) for classification of insecticide resistant and non-resistant proteins. Higher accuracies were obtained under RBF kernel than that of other kernels. Further, accuracies were observed to be higher for DPC feature set as compared to others. The proposed approach achieved an overall accuracy of >90% in discriminating resistant from non-resistant proteins. Further, the two classes of resistant proteins i.e., detoxification-based and target-based were discriminated from non-resistant proteins with >95% accuracy. Besides, >95% accuracy was also observed for discrimination of proteins involved in detoxification- and target-based resistance mechanisms. The proposed approach not only outperformed Blastp, PSI-Blast and Delta-Blast algorithms, but also achieved >92% accuracy while assessed using an independent dataset of 75 insecticide resistant proteins.
This paper presents the first computational approach for discriminating the insecticide resistant proteins from non-resistant proteins. Based on the proposed approach, an online prediction server DIRProt has also been developed for computational prediction of insecticide resistant proteins, which is accessible at http://cabgrid.res.in:8080/dirprot/ . The proposed approach is believed to supplement the efforts needed to develop dynamic insecticides in wet-lab by targeting the insecticide resistant proteins.
杀虫剂抗性是作物保护、人类和动物健康等领域害虫防治计划面临的一项重大挑战。昆虫对不同杀虫剂的抗性是由昆虫特定类别的基因编码的蛋白质赋予的。迄今为止,尚无计算工具可用于区分抗杀虫剂蛋白和非抗性蛋白。因此,开发这样一种计算工具将有助于预测抗杀虫剂蛋白,这些蛋白可作为开发合适杀虫剂的目标。
使用了五组不同的特征,即氨基酸组成(AAC)、二肽组成(DPC)、伪氨基酸组成(PAAC)、组成-转换-分布(CTD)和自相关函数(ACF),将蛋白质序列映射为数值特征向量。然后将编码的数值向量用作支持向量机(SVM)的输入,以对抗杀虫剂蛋白和非抗性蛋白进行分类。在径向基函数(RBF)核下获得的准确率高于其他核。此外,观察到DPC特征集的准确率高于其他特征集。所提出的方法在区分抗性蛋白和非抗性蛋白方面的总体准确率>90%。此外,两类抗性蛋白,即基于解毒和基于靶点的抗性蛋白与非抗性蛋白的区分准确率>95%。此外,在区分参与解毒和基于靶点的抗性机制的蛋白质时,准确率也>95%。所提出的方法不仅优于Blastp、PSI-Blast和Delta-Blast算法,而且在使用包含75个抗杀虫剂蛋白的独立数据集进行评估时,准确率>92%。
本文提出了第一种区分抗杀虫剂蛋白和非抗性蛋白的计算方法。基于所提出的方法,还开发了一个在线预测服务器DIRProt,用于对抗杀虫剂蛋白进行计算预测,可通过http://cabgrid.res.in:8080/dirprot/访问。据信,所提出的方法通过以抗杀虫剂蛋白为目标,补充了在湿实验室开发动态杀虫剂所需的努力。