IEEE/ACM Trans Comput Biol Bioinform. 2018 Jan-Feb;15(1):147-156. doi: 10.1109/TCBB.2016.2615010. Epub 2016 Oct 4.
Heme is an essential biomolecule that widely exists in numerous extant organisms. Accurately identifying heme binding residues (HEMEs) is of great importance in disease progression and drug development. In this study, a novel predictor named HEMEsPred was proposed for predicting HEMEs. First, several sequence- and structure-based features, including amino acid composition, motifs, surface preferences, and secondary structure, were collected to construct feature matrices. Second, a novel fast-adaptive ensemble learning scheme was designed to overcome the serious class-imbalance problem as well as to enhance the prediction performance. Third, we further developed ligand-specific models considering that different heme ligands varied significantly in their roles, sizes, and distributions. Statistical test proved the effectiveness of ligand-specific models. Experimental results on benchmark datasets demonstrated good robustness of our proposed method. Furthermore, our method also showed good generalization capability and outperformed many state-of-art predictors on two independent testing datasets. HEMEsPred web server was available at http://www.inforstation.com/HEMEsPred/ for free academic use.
血红素是一种广泛存在于众多现存生物中的重要生物分子。准确识别血红素结合残基(HEMEs)对于疾病进展和药物开发非常重要。在这项研究中,提出了一种名为 HEMEsPred 的新型预测器,用于预测 HEMEs。首先,收集了一些基于序列和结构的特征,包括氨基酸组成、基序、表面偏好和二级结构,以构建特征矩阵。其次,设计了一种新颖的快速自适应集成学习方案,以克服严重的类不平衡问题,并提高预测性能。第三,我们进一步开发了配体特异性模型,因为不同的血红素配体在作用、大小和分布上有很大的差异。统计测试证明了配体特异性模型的有效性。在基准数据集上的实验结果表明,我们提出的方法具有良好的稳健性。此外,我们的方法还表现出良好的泛化能力,并在两个独立的测试数据集上优于许多最先进的预测器。HEMEsPred 网络服务器可在 http://www.inforstation.com/HEMEsPred/ 上免费供学术使用。