Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.
School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, 7098 Liuxian Street, Shenzhen, 518055, China.
Amino Acids. 2022 May;54(5):799-809. doi: 10.1007/s00726-022-03145-5. Epub 2022 Mar 14.
Autophagy plays an important role in biological evolution and is regulated by many autophagy proteins. Accurate identification of autophagy proteins is crucially important to reveal their biological functions. Due to the expense and labor cost of experimental methods, it is urgent to develop automated, accurate and reliable sequence-based computational tools to enable the identification of novel autophagy proteins among numerous proteins and peptides. For this purpose, a new predictor named ATGPred-FL was proposed for the efficient identification of autophagy proteins. We investigated various sequence-based feature descriptors and adopted the feature learning method to generate corresponding, more informative probability features. Then, a two-step feature selection strategy based on accuracy was utilized to remove irrelevant and redundant features, leading to the most discriminative 14-dimensional feature set. The final predictor was built using a support vector machine classifier, which performed favorably on both the training and testing sets with accuracy values of 94.40% and 90.50%, respectively. ATGPred-FL is the first ATG machine learning predictor based on protein primary sequences. We envision that ATGPred-FL will be an effective and useful tool for autophagy protein identification, and it is available for free at http://lab.malab.cn/~acy/ATGPred-FL , the source code and datasets are accessible at https://github.com/jiaoshihu/ATGPred .
自噬在生物进化中起着重要作用,受许多自噬蛋白的调控。准确识别自噬蛋白对于揭示其生物学功能至关重要。由于实验方法的费用和劳动力成本,迫切需要开发自动化、准确和可靠的基于序列的计算工具,以便在众多蛋白质和肽中识别新的自噬蛋白。为此,我们提出了一种名为 ATGPred-FL 的新型预测器,用于有效地识别自噬蛋白。我们研究了各种基于序列的特征描述符,并采用特征学习方法生成相应的、更具信息量的概率特征。然后,我们采用基于准确性的两步特征选择策略来去除不相关和冗余的特征,从而得到最具判别力的 14 维特征集。最终的预测器是使用支持向量机分类器构建的,在训练集和测试集上的准确率分别为 94.40%和 90.50%,表现良好。ATGPred-FL 是第一个基于蛋白质一级序列的 ATG 机器学习预测器。我们预计 ATGPred-FL 将成为自噬蛋白鉴定的有效且有用的工具,它可以在 http://lab.malab.cn/~acy/ATGPred-FL 上免费获得,源代码和数据集可在 https://github.com/jiaoshihu/ATGPred 上获得。