University of Electronic Science and Technology of China and the Shenzhen Polytechnic, China.
School of Electronic and Communication Engineering, Shenzhen Polytechnic, China.
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab089.
Post-translational modifications (PTMs) play significant roles in regulating protein structure, activity and function, and they are closely involved in various pathologies. Therefore, the identification of associated PTMs is the foundation of in-depth research on related biological mechanisms, disease treatments and drug design. Due to the high cost and time consumption of high-throughput sequencing techniques, developing machine learning-based predictors has been considered an effective approach to rapidly recognize potential modified sites. However, the imbalanced distribution of true and false PTM sites, namely, the data imbalance problem, largely effects the reliability and application of prediction tools. In this article, we conduct a systematic survey of the research progress in the imbalanced PTMs classification. First, we describe the modeling process in detail and outline useful data imbalance solutions. Then, we summarize the recently proposed bioinformatics tools based on imbalanced PTM data and simultaneously build a convenient website, ImClassi_PTMs (available at lab.malab.cn/∼dlj/ImbClassi_PTMs/), to facilitate the researchers to view. Moreover, we analyze the challenges of current computational predictors and propose some suggestions to improve the efficiency of imbalance learning. We hope that this work will provide comprehensive knowledge of imbalanced PTM recognition and contribute to advanced predictors in the future.
后翻译修饰(PTMs)在调节蛋白质结构、活性和功能方面起着重要作用,它们与各种病理密切相关。因此,鉴定相关的 PTMs 是深入研究相关生物学机制、疾病治疗和药物设计的基础。由于高通量测序技术成本高、耗时,因此开发基于机器学习的预测器已被认为是快速识别潜在修饰位点的有效方法。然而,真正和假 PTM 位点的不平衡分布,即数据不平衡问题,极大地影响了预测工具的可靠性和应用。在本文中,我们对不平衡 PTMs 分类的研究进展进行了系统的调查。首先,我们详细描述了建模过程,并概述了有用的数据不平衡解决方案。然后,我们总结了最近基于不平衡 PTM 数据提出的生物信息学工具,并同时构建了一个方便的网站,ImClassi_PTMs(可在 lab.malab.cn/∼dlj/ImbClassi_PTMs/ 上访问),以便研究人员查看。此外,我们分析了当前计算预测器面临的挑战,并提出了一些建议,以提高不平衡学习的效率。我们希望这项工作将为不平衡 PTM 识别提供全面的知识,并为未来的高级预测器做出贡献。