Akmal Muhammad Aizaz, Hassan Muhammad Awais, Muhammad Shoaib, Khurshid Khaldoon S, Mohamed Abdullah
Department of Computer Science, University of Engineering and Technology, KSK, Lahore, Punjab, Pakistan.
Department of Computer Science, University of Engineering and Technology, Lahore, Punjab, Pakistan.
PeerJ Comput Sci. 2022 Sep 21;8:e1069. doi: 10.7717/peerj-cs.1069. eCollection 2022.
N-linked is the most common type of glycosylation which plays a significant role in identifying various diseases such as type I diabetes and cancer and helps in drug development. Most of the proteins cannot perform their biological and psychological functionalities without undergoing such modification. Therefore, it is essential to identify such sites by computational techniques because of experimental limitations. This study aims to analyze and synthesize the progress to discover N-linked places using machine learning methods. It also explores the performance of currently available tools to predict such sites. Almost seventy research articles published in recognized journals of the N-linked glycosylation field have shortlisted after the rigorous filtering process. The findings of the studies have been reported based on multiple aspects: publication channel, feature set construction method, training algorithm, and performance evaluation. Moreover, a literature survey has developed a taxonomy of N-linked sequence identification. Our study focuses on the performance evaluation criteria, and the importance of N-linked glycosylation motivates us to discover resources that use computational methods instead of the experimental method due to its limitations.
N-糖基化是最常见的糖基化类型,在识别多种疾病(如I型糖尿病和癌症)中发挥着重要作用,并有助于药物开发。大多数蛋白质在不进行这种修饰的情况下无法发挥其生物学和生理功能。因此,由于实验限制,通过计算技术识别此类位点至关重要。本研究旨在分析和综合利用机器学习方法发现N-糖基化位点的进展。它还探索了当前可用工具预测此类位点的性能。经过严格筛选过程,在N-糖基化领域的知名期刊上发表的近70篇研究文章被列入候选名单。研究结果已基于多个方面进行报告:出版渠道、特征集构建方法、训练算法和性能评估。此外,一项文献调查制定了N-糖基化序列识别的分类法。我们的研究侧重于性能评估标准,N-糖基化的重要性促使我们发现由于实验方法的局限性而使用计算方法的资源。