Ugurlu Sadettin Y, McDonald David, He Shan
School of Computer Science, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK.
AIA Insights Ltd, Birmingham, UK.
J Cheminform. 2024 Oct 23;16(1):116. doi: 10.1186/s13321-024-00882-5.
A crucial mechanism for controlling the actions of proteins is allostery. Allosteric modulators have the potential to provide many benefits compared to orthosteric ligands, such as increased selectivity and saturability of their effect. The identification of new allosteric sites presents prospects for the creation of innovative medications and enhances our comprehension of fundamental biological mechanisms. Allosteric sites are increasingly found in different protein families through various techniques, such as machine learning applications, which opens up possibilities for creating completely novel medications with a diverse variety of chemical structures. Machine learning methods, such as PASSer, exhibit limited efficacy in accurately finding allosteric binding sites when relying solely on 3D structural information.Scientific ContributionPrior to conducting feature selection for allosteric binding site identification, integration of supporting amino-acid-based information to 3D structural knowledge is advantageous. This approach can enhance performance by ensuring accuracy and robustness. Therefore, we have developed an accurate and robust model called Multimodel Ensemble Feature Selection for Allosteric Site Identification (MEF-AlloSite) after collecting 9460 relevant and diverse features from the literature to characterise pockets. The model employs an accurate and robust multimodal feature selection technique for the small training set size of only 90 proteins to improve predictive performance. This state-of-the-art technique increased the performance in allosteric binding site identification by selecting promising features from 9460 features. Also, the relationship between selected features and allosteric binding sites enlightened the understanding of complex allostery for proteins by analysing selected features. MEF-AlloSite and state-of-the-art allosteric site identification methods such as PASSer2.0 and PASSerRank have been tested on three test cases 51 times with a different split of the training set. The Student's t test and Cohen's D value have been used to evaluate the average precision and ROC AUC score distribution. On three test cases, most of the p-values ( ) and the majority of Cohen's D values ( ) showed that MEF-AlloSite's 1-6% higher mean of average precision and ROC AUC than state-of-the-art allosteric site identification methods are statistically significant.
控制蛋白质作用的一个关键机制是变构。与正构配体相比,变构调节剂有可能带来诸多益处,比如其作用的选择性和饱和性增加。新变构位点的鉴定为创新药物的研发带来了前景,并增进了我们对基本生物学机制的理解。通过各种技术,如机器学习应用,在不同蛋白质家族中越来越多地发现了变构位点,这为研发具有各种化学结构的全新药物开辟了可能性。诸如PASSer等机器学习方法在仅依靠三维结构信息准确寻找变构结合位点时,功效有限。
科学贡献
在进行变构结合位点鉴定的特征选择之前,将基于氨基酸的支持信息与三维结构知识相结合是有益的。这种方法可以通过确保准确性和稳健性来提高性能。因此,在从文献中收集了9460个相关且多样的特征以表征口袋之后,我们开发了一种准确且稳健的模型,称为用于变构位点鉴定的多模型集成特征选择(MEF-AlloSite)。该模型针对仅90种蛋白质的小训练集规模采用了准确且稳健的多模态特征选择技术,以提高预测性能。这种先进技术通过从9460个特征中选择有前景的特征,提高了变构结合位点鉴定的性能。此外,所选特征与变构结合位点之间的关系通过分析所选特征,增进了对蛋白质复杂变构的理解。MEF-AlloSite以及诸如PASSer2.0和PASSerRank等先进的变构位点鉴定方法已在三个测试案例上进行了51次测试,训练集划分不同。使用学生t检验和科恩D值来评估平均精度和ROC AUC分数分布。在三个测试案例中,大多数p值( )和大多数科恩D值( )表明,MEF-AlloSite的平均精度和ROC AUC比先进的变构位点鉴定方法高1-6%,具有统计学意义。