Department of Management Studies, Coimbatore Institute of Engineering and Technology, Coimbatore, Tamilnadu, India.
Department of Pharmaceutical Analysis, PSG College of Pharmacy, Coimbatore, Tamilnadu, India.
Methods Mol Biol. 2022;2496:179-202. doi: 10.1007/978-1-0716-2305-3_10.
Posttranslational modifications (PTMs) of proteins impart a significant role in human cellular functions ranging from localization to signal transduction. Hundreds of PTMs act in a human cell. Among them, only the selected PTMs are well established and documented. PubMed includes thousands of papers on the selected PTMs, and it is a challenge for the biomedical researchers to assimilate useful information manually. Alternatively, text mining approaches and machine learning algorithm automatically extract the relevant information from PubMed. Protein phosphorylation is a well-established PTM and several research works are under way. Many existing systems are there for protein phosphorylation information extraction. A recent approach uses a hybrid approach using text mining and machine learning to extract protein phosphorylation information from PubMed. Some of the other common PTMs that exhibit similar features in terms of entities that are involved in PTM process, that is, the substrate, the enzymes, and the amino acid residues, are glycosylation, acetylation, methylation, hydroxylation, and ubiquitination. This has motivated us to repurpose and extend the text mining protocol and machine learning information extraction methodology developed for protein phosphorylation to these PTMs. In this chapter, the chemistry behind each of the PTMs is briefly outlined and the text mining protocol and machine learning algorithm adaption is explained for the same.
蛋白质的翻译后修饰(PTMs)在人类细胞功能中起着重要作用,从定位到信号转导。数百种 PTM 在人类细胞中起作用。其中,只有选定的 PTM 得到了很好的确立和记录。PubMed 包含数千篇关于选定 PTM 的论文,生物医学研究人员手动吸收有用信息是一项挑战。或者,文本挖掘方法和机器学习算法可以自动从 PubMed 中提取相关信息。蛋白质磷酸化是一种成熟的 PTM,目前有许多研究工作正在进行。有许多现有的系统用于提取蛋白质磷酸化信息。最近的一种方法使用混合方法,结合文本挖掘和机器学习,从 PubMed 中提取蛋白质磷酸化信息。其他一些常见的 PTM 也具有类似的特征,涉及 PTM 过程中的实体,即底物、酶和氨基酸残基,如糖基化、乙酰化、甲基化、羟化和泛素化。这促使我们重新利用和扩展为蛋白质磷酸化开发的文本挖掘协议和机器学习信息提取方法。在本章中,简要概述了每种 PTM 的化学原理,并解释了相同的文本挖掘协议和机器学习算法适应。