IEEE/ACM Trans Comput Biol Bioinform. 2019 Nov-Dec;16(6):2046-2056. doi: 10.1109/TCBB.2018.2824332. Epub 2018 Apr 9.
The function of a flavoprotein is determined to a great extent by the binding sites on its surface that interacts with flavin adenine dinucleotide (FAD). Malfunction or dysregulation of FAD binding leads to a series of diseases. Therefore, accurately identifying FAD interacting residues (FIRs) provides insights into the molecular mechanisms of flavoprotein-related biological processes and disease progression. In this paper, a new computational method is proposed for identifying FIRs from protein sequences. Various sequence-derived discriminative features are explored. We analyze the distinctions of these features between FIRs and non-FIRs. We also investigate the predictive capabilities of both individual features and combinations of features. A relief algorithm followed by incremental feature selection (relief-IFS) is then adopted to search the optimal features. Finally, a random forest (RF) module is used to predict FIRs based on the optimal features. Using a 5-fold cross-validation test, the proposed method performs well, with a sensitivity of 0.847, a specificity of 0.933, an accuracy of 0.890, and a Matthews correlation coefficient (MCC) of 0.782, thereby outperforming previous methods. These results indicate that our method is relatively successful at predicting FIRs.
黄素蛋白的功能在很大程度上取决于其表面与黄素腺嘌呤二核苷酸(FAD)相互作用的结合位点。FAD 结合的功能障碍或失调会导致一系列疾病。因此,准确识别 FAD 相互作用残基(FIRs)可以深入了解黄素蛋白相关生物过程和疾病进展的分子机制。在本文中,我们提出了一种从蛋白质序列中识别 FIRs 的新计算方法。探索了各种源自序列的鉴别特征。我们分析了这些特征在 FIRs 和非 FIRs 之间的区别。我们还研究了单个特征和特征组合的预测能力。然后采用 Relief 算法和增量特征选择(relief-IFS)来搜索最佳特征。最后,基于最佳特征,使用随机森林(RF)模块来预测 FIRs。通过 5 折交叉验证测试,所提出的方法表现良好,灵敏度为 0.847,特异性为 0.933,准确性为 0.890,马修斯相关系数(MCC)为 0.782,优于以前的方法。这些结果表明,我们的方法在预测 FIRs 方面相对成功。