一种用于鉴定含有 C 型凝集素结构域（CTLD）的蛋白质的机器学习方法。

A Machine Learning Approach to Identify C Type Lectin Domain (CTLD) Containing Proteins.

机构信息

Department of Biotechnology, Panjab University, Sector-25, Chandigarh, 160014, India.

University Institute of Engineering & Technology, Panjab University, Sector-25, Chandigarh, 160014, India.

出版信息

Protein J. 2024 Aug;43(4):718-725. doi: 10.1007/s10930-024-10224-x. Epub 2024 Jul 28.

DOI:10.1007/s10930-024-10224-x

PMID:39068630

Abstract

Lectins are sugar interacting proteins which bind specific glycans reversibly and have ubiquitous presence in all forms of life. They have diverse biological functions such as cell signaling, molecular recognition, etc. C-type lectins (CTL) are a group of proteins from the lectin family which have been studied extensively in animals and are reported to be involved in immune functions, carcinogenesis, cell signaling, etc. The carbohydrate recognition domain (CRD) in CTL has a highly variable protein sequence and proteins carrying this domain are also referred to as C-type lectin domain containing proteins (CTLD). Because of this low sequence homology, identification of CTLD from hypothetical proteins in the sequenced genomes using homology based programs has limitations. Machine learning (ML) tools use characteristic features to identify homologous sequences and it has been used to develop a tool for identification of CTLD. Initially 500 sequences of well annotated CTLD and 500 sequences of non CTLD were used in developing the machine learning model. The classifier program Linear SVC from sci kit library of python was used and characteristic features in CTLD sequences like dipeptide and tripeptide composition were used as training attributes in various classifiers. A precision, recall and multiple correlation coefficient (MCC) value of 0.92, 0.91 and 0.82 respectively were obtained when tested on external test set. On fine tuning of the parameters like kernel, C value, gamma, degree and increasing number of non CTLD sequences there was improvement in precision, recall and MCC and the corresponding values were 0.99, 0.99 and 0.96. New CTLD have also been identified in the hypothetical segment of human genome using the trained model. The tool is available on our local server for interested users.

摘要

凝集素是一种与糖相互作用的蛋白质，能可逆地结合特定的糖，并广泛存在于所有生命形式中。它们具有多种生物学功能，如细胞信号转导、分子识别等。C 型凝集素（CTL）是凝集素家族中的一组蛋白质，在动物中得到了广泛的研究，据报道它们参与免疫功能、致癌作用、细胞信号转导等。CTL 中的碳水化合物识别结构域（CRD）具有高度可变的蛋白质序列，携带该结构域的蛋白质也被称为 C 型凝集素结构域包含蛋白（CTLD）。由于这种低序列同源性，使用基于同源性的程序从测序基因组中的假设蛋白中识别 CTLD 存在局限性。机器学习（ML）工具使用特征来识别同源序列，并已被用于开发识别 CTLD 的工具。最初，使用 500 个经过良好注释的 CTLD 序列和 500 个非 CTLD 序列来开发机器学习模型。使用了来自 Python 的 sci kit 库的线性 SVC 分类器程序，并将 CTLD 序列中的特征，如二肽和三肽组成，用作各种分类器的训练属性。当在外部测试集上进行测试时，获得了 0.92、0.91 和 0.82 的精度、召回率和多重相关系数（MCC）值。通过调整核、C 值、伽马值、度和增加非 CTLD 序列的数量等参数进行微调，精度、召回率和 MCC 都得到了提高，相应的值分别为 0.99、0.99 和 0.96。还使用训练模型在人类基因组的假设片段中识别了新的 CTLD。有兴趣的用户可以在我们的本地服务器上使用该工具。

相似文献

A Machine Learning Approach to Identify C Type Lectin Domain (CTLD) Containing Proteins.一种用于鉴定含有 C 型凝集素结构域（CTLD）的蛋白质的机器学习方法。

Protein J. 2024 Aug;43(4):718-725. doi: 10.1007/s10930-024-10224-x. Epub 2024 Jul 28.

C-type lectin-like domains in Fugu rubripes.红鳍东方鲀中的C型凝集素样结构域。

BMC Genomics. 2004 Aug 1;5(1):51. doi: 10.1186/1471-2164-5-51.

Comparative analysis of structural properties of the C-type-lectin-like domain (CTLD).C型凝集素样结构域（CTLD）结构特性的比较分析。

Proteins. 2003 Aug 15;52(3):466-77. doi: 10.1002/prot.10626.

High Innate Immune Specificity through Diversified C-Type Lectin-Like Domain Proteins in Invertebrates.无脊椎动物中通过多样化的C型凝集素样结构域蛋白实现的高度固有免疫特异性

J Innate Immun. 2016;8(2):129-42. doi: 10.1159/000441475. Epub 2015 Nov 19.

β-Glucan-induced cooperative oligomerization of Dectin-1 C-type lectin-like domain.β-葡聚糖诱导 Dectin-1 C 型凝集素样结构域协同寡聚化。

Glycobiology. 2018 Aug 1;28(8):612-623. doi: 10.1093/glycob/cwy039.

Evolutionary analysis reveals collective properties and specificity in the C-type lectin and lectin-like domain superfamily.进化分析揭示了C型凝集素和凝集素样结构域超家族中的集体特性和特异性。

Proteins. 2003 Oct 1;53(1):44-55. doi: 10.1002/prot.10440.

A C-type lectin like-domain (CTLD)-containing protein (PtLP) from the swimming crab Portunus trituberculatus.来自三疣梭子蟹的一种含C型凝集素样结构域（CTLD）的蛋白（PtLP）。

Fish Shellfish Immunol. 2008 Sep;25(3):311-4. doi: 10.1016/j.fsi.2008.05.003. Epub 2008 May 18.

The C-type lectin-like domain superfamily.C型凝集素样结构域超家族。

FEBS J. 2005 Dec;272(24):6179-217. doi: 10.1111/j.1742-4658.2005.05031.x.

Characterization of two C-type lectin-like domain (CTLD)-containing proteins from the cDNA library of Chinese mitten crab Eriocheir sinensis.中国绒螯蟹 cDNA 文库中两种含 C 型凝集素样结构域（CTLD）蛋白的特性分析。

Fish Shellfish Immunol. 2011 Feb;30(2):515-24. doi: 10.1016/j.fsi.2010.11.027. Epub 2010 Dec 4.

Human antibodies targeting the C-type lectin-like domain of the tumor endothelial cell marker clec14a regulate angiogenic properties in vitro.靶向肿瘤内皮细胞标志物 clec14a 的 C 型凝集素样结构域的人源抗体调节体外血管生成特性。

Oncogene. 2013 Nov 28;32(48):5449-57. doi: 10.1038/onc.2013.156. Epub 2013 May 6.

本文引用的文献

Back2Basics: animal lectins: an insight into a highly versatile recognition protein.基础知识回顾：动物凝集素：深入了解一种高度多功能的识别蛋白。

J Proteins Proteom. 2023;14(1):43-59. doi: 10.1007/s42485-022-00102-4. Epub 2022 Dec 29.

RaacFold: a webserver for 3D visualization and analysis of protein structure by using reduced amino acid alphabets.RaacFold：一个通过使用简化氨基酸字母表来进行蛋白质结构的 3D 可视化和分析的网络服务器。

Nucleic Acids Res. 2022 Jul 5;50(W1):W633-W638. doi: 10.1093/nar/gkac415.

IL13Pred: A method for predicting immunoregulatory cytokine IL-13 inducing peptides.IL13Pred：一种预测免疫调节细胞因子白细胞介素-13诱导肽的方法。

Comput Biol Med. 2022 Apr;143:105297. doi: 10.1016/j.compbiomed.2022.105297. Epub 2022 Feb 8.

130 years of Plant Lectin Research.130 年的植物凝集素研究

Glycoconj J. 2020 Oct;37(5):533-551. doi: 10.1007/s10719-020-09942-y. Epub 2020 Aug 29.

Recent Advances in Machine Learning Methods for Predicting Heat Shock Proteins.机器学习方法在预测热休克蛋白方面的最新进展。

Curr Drug Metab. 2019;20(3):224-228. doi: 10.2174/1389200219666181031105916.

A Survey for Predicting Enzyme Family Classes Using Machine Learning Methods.基于机器学习方法的酶家族分类预测研究。

Curr Drug Targets. 2019;20(5):540-550. doi: 10.2174/1389450119666181002143355.

C-type lectins in immunity and homeostasis.C 型凝集素在免疫和稳态中的作用。

Nat Rev Immunol. 2018 Jun;18(6):374-389. doi: 10.1038/s41577-018-0004-8.

Lectins: a primer for histochemists and cell biologists.凝集素：组织化学家和细胞生物学家入门指南。

Histochem Cell Biol. 2017 Feb;147(2):199-222. doi: 10.1007/s00418-016-1524-6. Epub 2016 Dec 24.

Recent Progress in Machine Learning-Based Methods for Protein Fold Recognition.基于机器学习的蛋白质折叠识别方法的最新进展

Int J Mol Sci. 2016 Dec 16;17(12):2118. doi: 10.3390/ijms17122118.

Prediction of Protein Submitochondrial Locations by Incorporating Dipeptide Composition into Chou's General Pseudo Amino Acid Composition.通过将二肽组成纳入周氏广义伪氨基酸组成来预测蛋白质的亚线粒体定位

J Membr Biol. 2016 Jun;249(3):293-304. doi: 10.1007/s00232-015-9868-8. Epub 2016 Jan 8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于鉴定含有 C 型凝集素结构域（CTLD）的蛋白质的机器学习方法。

A Machine Learning Approach to Identify C Type Lectin Domain (CTLD) Containing Proteins.

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献