Zhu Wen, Yuan Shi-Shi, Li Jian, Huang Cheng-Bing, Lin Hao, Liao Bo
Key Laboratory of Computational Science and Application of Hainan Province, Haikou 571158, China.
Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou 571158, China.
Diagnostics (Basel). 2023 Jul 24;13(14):2465. doi: 10.3390/diagnostics13142465.
Heparin-binding protein (HBP) is a cationic antibacterial protein derived from multinuclear neutrophils and an important biomarker of infectious diseases. The correct identification of HBP is of great significance to the study of infectious diseases. This work provides the first HBP recognition framework based on machine learning to accurately identify HBP. By using four sequence descriptors, HBP and non-HBP samples were represented by discrete numbers. By inputting these features into a support vector machine (SVM) and random forest (RF) algorithm and comparing the prediction performances of these methods on training data and independent test data, it is found that the SVM-based classifier has the greatest potential to identify HBP. The model could produce an auROC of 0.981 ± 0.028 on training data using 10-fold cross-validation and an overall accuracy of 95.0% on independent test data. As the first model for HBP recognition, it will provide some help for infectious diseases and stimulate further research in related fields.
肝素结合蛋白(HBP)是一种源自多核中性粒细胞的阳离子抗菌蛋白,也是传染病的重要生物标志物。HBP的正确识别对传染病研究具有重要意义。这项工作提供了首个基于机器学习的HBP识别框架,以准确识别HBP。通过使用四种序列描述符,HBP和非HBP样本由离散数字表示。通过将这些特征输入支持向量机(SVM)和随机森林(RF)算法,并比较这些方法在训练数据和独立测试数据上的预测性能,发现基于SVM的分类器在识别HBP方面具有最大潜力。该模型在使用10折交叉验证的训练数据上可产生0.981±0.028的曲线下面积(auROC),在独立测试数据上的总体准确率为95.0%。作为首个用于HBP识别的模型,它将为传染病研究提供一些帮助,并激发相关领域的进一步研究。