School of Medicine and Pharmacy, Ocean University of China, Qingdao 266003, China.
Pilot National Laboratory for Marine Science and Technology (Qingdao), Qingdao 266200, China.
Toxins (Basel). 2022 Nov 21;14(11):811. doi: 10.3390/toxins14110811.
Peptide toxins generally have extreme pharmacological activities and provide a rich source for the discovery of drug leads. However, determining the optimal activity of a new peptide can be a long and expensive process. In this study, peptide toxins were retrieved from Uniprot; three positive-unlabeled (PU) learning schemes, adaptive basis classifier, two-step method, and PU bagging were adopted to develop models for predicting the biological function of new peptide toxins. All three schemes were embedded with 14 machine learning classifiers. The prediction results of the adaptive base classifier and the two-step method were highly consistent. The models with top comprehensive performances were further optimized by feature selection and hyperparameter tuning, and the models were validated by making predictions for 61 three-finger toxins or the external HemoPI dataset. Biological functions that can be identified by these models include cardiotoxicity, vasoactivity, lipid binding, hemolysis, neurotoxicity, postsynaptic neurotoxicity, hypotension, and cytolysis, with relatively weak predictions for hemostasis and presynaptic neurotoxicity. These models are discovery-prediction tools for active peptide toxins and are expected to accelerate the development of peptide toxins as drugs.
肽毒素通常具有极端的药理学活性,为药物先导的发现提供了丰富的来源。然而,确定一种新肽的最佳活性可能是一个漫长而昂贵的过程。在这项研究中,从 Uniprot 中检索到肽毒素;采用三种正未标记(PU)学习方案,自适应基分类器、两步法和 PU 装袋,开发用于预测新肽毒素生物功能的模型。所有三种方案都嵌入了 14 种机器学习分类器。自适应基分类器和两步法的预测结果高度一致。通过特征选择和超参数调整对具有最佳综合性能的模型进行进一步优化,并通过对 61 种三指毒素或外部 HemoPI 数据集进行预测来验证模型。这些模型可以识别的生物功能包括心脏毒性、血管活性、脂质结合、溶血、神经毒性、突触后神经毒性、低血压和细胞溶解,对止血和突触前神经毒性的预测较弱。这些模型是活性肽毒素的发现预测工具,有望加速肽毒素作为药物的开发。