Wang Chao, Yang Qiang
Center for Genomic and Personalized Medicine, Guangxi key Laboratory for Genomic and Personalized Medicine, Guangxi Collaborative Innovation Center for Genomic and Personalized Medicine, Guangxi Medical University, Nanning, Guangxi, China; School of Software Engineering, Chengdu University of Information Technology, Chengdu, China.
Yibin University, Yibin, China.
Comput Biol Med. 2023 May;158:106798. doi: 10.1016/j.compbiomed.2023.106798. Epub 2023 Mar 21.
Protein phosphorylation plays a vital role in signal transduction pathways and diverse cellular processes. To date, a tremendous number of in silico tools have been designed for phosphorylation site identification, but few of them are suitable for the identification of fungal phosphorylation sites. This largely hampers the functional investigation of fungal phosphorylation. In this paper, we present ScerePhoSite, a machine learning method for fungal phosphorylation site identification. The sequence fragments are represented by hybrid physicochemical features, and then LGB-based feature importance combined with the sequential forward search method is used to choose the optimal feature subset. As a result, ScerePhoSite surpasses current available tools and shown a more robust and balanced performance. Furthermore, the impact and contribution of specific features on the model performance were investigated by SHAP values. We expect ScerePhoSite to be a useful bioinformatics tool that complements hands-on experiments for the pre-screening of possible phosphorylation sites and facilitates our functional understanding of phosphorylation modification in fungi. The source code and datasets are accessible at https://github.com/wangchao-malab/ScerePhoSite/.
蛋白质磷酸化在信号转导途径和多种细胞过程中起着至关重要的作用。迄今为止,已经设计了大量的计算机工具用于磷酸化位点识别,但其中很少有适用于真菌磷酸化位点识别的。这在很大程度上阻碍了真菌磷酸化的功能研究。在本文中,我们提出了ScerePhoSite,一种用于真菌磷酸化位点识别的机器学习方法。序列片段由混合物理化学特征表示,然后基于LightGBM的特征重要性结合顺序向前搜索方法用于选择最优特征子集。结果,ScerePhoSite超越了当前可用的工具,并表现出更稳健和平衡的性能。此外,通过SHAP值研究了特定特征对模型性能的影响和贡献。我们期望ScerePhoSite成为一个有用的生物信息学工具,为可能的磷酸化位点的预筛选补充实际实验,并促进我们对真菌中磷酸化修饰的功能理解。源代码和数据集可在https://github.com/wangchao-malab/ScerePhoSite/获取。