Fu Xiuhao, Duan Hao, Zang Xiaofeng, Liu Chunling, Li Xingfeng, Zhang Qingchen, Zhang Zilong, Zou Quan, Cui Feifei
IEEE/ACM Trans Comput Biol Bioinform. 2024 Nov-Dec;21(6):1897-1910. doi: 10.1109/TCBB.2024.3425644. Epub 2024 Dec 10.
Tuberculosis has plagued mankind since ancient times, and the struggle between humans and tuberculosis continues. Mycobacterium tuberculosis is the leading cause of tuberculosis, infecting nearly one-third of the world's population. The rise of peptide drugs has created a new direction in the treatment of tuberculosis. Therefore, for the treatment of tuberculosis, the prediction of anti-tuberculosis peptides is crucial. This paper proposes an anti-tuberculosis peptide prediction method based on hybrid features and stacked ensemble learning. First, a random forest (RF) and extremely randomized tree (ERT) are selected as first-level learning of stacked ensembles. Then, the five best-performing feature encoding methods are selected to obtain the hybrid feature vector, and then the decision tree and recursive feature elimination (DT-RFE) are used to refine the hybrid feature vector. After selection, the optimal feature subset is used as the input of the stacked ensemble model. At the same time, logistic regression (LR) is used as a stacked ensemble secondary learner to build the final stacked ensemble model Hyb_SEnc. The prediction accuracy of Hyb_SEnc achieved 94.68% and 95.74% on the independent test sets of AntiTb_MD and AntiTb_RD, respectively.
结核病自古以来就一直困扰着人类,人类与结核病的斗争仍在继续。结核分枝杆菌是结核病的主要病因,感染了世界近三分之一的人口。肽类药物的兴起为结核病的治疗开辟了新方向。因此,对于结核病的治疗,抗结核肽的预测至关重要。本文提出了一种基于混合特征和堆叠集成学习的抗结核肽预测方法。首先,选择随机森林(RF)和极端随机树(ERT)作为堆叠集成的一级学习器。然后,选择五种性能最佳的特征编码方法来获得混合特征向量,接着使用决策树和递归特征消除(DT-RFE)对混合特征向量进行优化。经过筛选后,将最优特征子集作为堆叠集成模型的输入。同时,使用逻辑回归(LR)作为堆叠集成的二级学习器来构建最终的堆叠集成模型Hyb_SEnc。Hyb_SEnc在AntiTb_MD和AntiTb_RD的独立测试集上的预测准确率分别达到了94.68%和95.74%。