Shoombuatong Watshara, Schaduangrat Nalini, Nikom Jaru
Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700.
Research Methodology and Data Analytics Program, Faculty of Science & Technology, Prince of Songkla University, Pattani, Thailand, 94000.
EXCLI J. 2023 Aug 29;22:915-927. doi: 10.17179/excli2023-6410. eCollection 2023.
Efficiently and precisely identifying drug targets is crucial for developing and discovering potential medications. While conventional experimental approaches can accurately pinpoint these targets, they suffer from time constraints and are not easily adaptable to high-throughput processes. On the other hand, computational approaches, particularly those utilizing machine learning (ML), offer an efficient means to accelerate the prediction of druggable proteins based solely on their primary sequences. Recently, several state-of-the-art computational methods have been developed for predicting and analyzing druggable proteins. These computational methods showed high diversity in terms of benchmark datasets, feature extraction schemes, ML algorithms, evaluation strategies and webserver/software usability. Thus, our objective is to reexamine these computational approaches and conduct a comprehensive assessment of their strengths and weaknesses across multiple aspects. In this study, we deliver the first comprehensive survey regarding the state-of-the-art computational approaches for prediction of druggable proteins. First, we provided information regarding the existing benchmark datasets and the types of ML methods employed. Second, we investigated the effectiveness of these computational methods in druggable protein identification for each benchmark dataset. Third, we summarized the important features used in this field and the existing webserver/software. Finally, we addressed the present constraints of the existing methods and offer valuable guidance to the scientific community in designing and developing novel prediction models. We anticipate that this comprehensive review will provide crucial information for the development of more accurate and efficient druggable protein predictors.
高效且精确地识别药物靶点对于开发和发现潜在药物至关重要。虽然传统的实验方法能够准确地确定这些靶点,但它们受到时间限制,并且不容易适应高通量流程。另一方面,计算方法,特别是那些利用机器学习(ML)的方法,提供了一种仅基于蛋白质一级序列来加速可成药蛋白质预测的有效手段。最近,已经开发了几种用于预测和分析可成药蛋白质的先进计算方法。这些计算方法在基准数据集、特征提取方案、机器学习算法、评估策略以及网络服务器/软件可用性等方面表现出高度的多样性。因此,我们的目标是重新审视这些计算方法,并从多个方面对它们的优缺点进行全面评估。在本研究中,我们首次对用于预测可成药蛋白质的先进计算方法进行了全面调查。首先,我们提供了有关现有基准数据集和所采用机器学习方法类型的信息。其次,我们研究了这些计算方法在每个基准数据集的可成药蛋白质识别中的有效性。第三,我们总结了该领域使用的重要特征以及现有的网络服务器/软件。最后,我们阐述了现有方法目前存在的局限性,并为科学界在设计和开发新型预测模型方面提供了有价值的指导。我们预计,这一全面综述将为开发更准确、高效的可成药蛋白质预测器提供关键信息。