Charoenkwan Phasit, Kanthawong Sakawrat, Nantasenamat Chanin, Hasan Md Mehedi, Shoombuatong Watshara
Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand.
Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen 40002, Thailand.
Genomics. 2021 Jan;113(1 Pt 2):689-698. doi: 10.1016/j.ygeno.2020.09.065. Epub 2020 Oct 2.
Fast, accurate identification and characterization of amyloid proteins at a large-scale is essential for understating their role in therapeutic intervention strategies. As a matter of fact, there exist only one in silico model for amyloid protein identification using the random forest (RF) model in conjunction with various feature types namely the RFAmy. However, it suffers from low interpretability for biologists. Thus, it is highly desirable to develop a simple and easily interpretable prediction method with robust accuracy as compared to the existing complicated model. In this study, we propose iAMY-SCM, the first scoring card method-based predictor for predicting and analyzing amyloid proteins. Herein, the iAMY-SCM made use of a simple weighted-sum function in conjunction with the propensity scores of dipeptides for the amyloid protein identification. Cross-validation results indicated that iAMY-SCM provided an accuracy of 0.895 that corresponded to 10-22% higher performance than that of widely used machine learning models. Furthermore, iAMY-SCM achieving an accuracy of 0.827 as evaluated by an independent test, which was found to be comparable to that of RFAmy and was approximately 9-13% higher than widely used machine learning models. Furthermore, the analysis of estimated propensity scores of amino acids and dipeptides were performed to provide insights into the biophysical and biochemical properties of amyloid proteins. As such, this demonstrates that the proposed iAMY-SCM is efficient and reliable in terms of simplicity, interpretability and implementation. To facilitate ease of use of the proposed iAMY-SCM, a user-friendly and publicly accessible web server at http://camt.pythonanywhere.com/iAMY-SCM has been established. We anticipate that that iAMY-SCM will be an important tool for facilitating the large-scale prediction and characterization of amyloid protein.
大规模快速、准确地识别和表征淀粉样蛋白对于理解它们在治疗干预策略中的作用至关重要。事实上,目前仅存在一种使用随机森林(RF)模型结合各种特征类型来识别淀粉样蛋白的计算机模拟模型,即RFAmy。然而,它对生物学家来说解释性较低。因此,与现有的复杂模型相比,非常需要开发一种简单且易于解释的预测方法,同时具有稳健的准确性。在本研究中,我们提出了iAMY-SCM,这是第一种基于计分卡方法的用于预测和分析淀粉样蛋白的预测器。在此,iAMY-SCM利用简单的加权和函数结合二肽的倾向得分来识别淀粉样蛋白。交叉验证结果表明,iAMY-SCM的准确率为0.895,比广泛使用的机器学习模型性能高10%-22%。此外,通过独立测试评估,iAMY-SCM的准确率为0.827,发现与RFAmy相当,比广泛使用的机器学习模型高约9%-13%。此外,还对氨基酸和二肽的估计倾向得分进行了分析,以深入了解淀粉样蛋白的生物物理和生化特性。因此,这表明所提出的iAMY-SCM在简单性、可解释性和可实现性方面是高效且可靠的。为了便于使用所提出的iAMY-SCM,已建立了一个用户友好且可公开访问的网络服务器,网址为http://camt.pythonanywhere.com/iAMY-SCM。我们预计iAMY-SCM将成为促进淀粉样蛋白大规模预测和表征的重要工具。