利用最大依赖分解从一组对齐的信号序列中识别保守基序。

Exploiting maximal dependence decomposition to identify conserved motifs from a group of aligned signal sequences.

机构信息

Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan.

出版信息

Bioinformatics. 2011 Jul 1;27(13):1780-7. doi: 10.1093/bioinformatics/btr291. Epub 2011 May 6.

DOI:10.1093/bioinformatics/btr291

PMID:21551145

Abstract

UNLABELLED

Bioinformatics research often requires conservative analyses of a group of sequences associated with a specific biological function (e.g. transcription factor binding sites, micro RNA target sites or protein post-translational modification sites). Due to the difficulty in exploring conserved motifs on a large-scale sequence data involved with various signals, a new method, MDDLogo, is developed. MDDLogo applies maximal dependence decomposition (MDD) to cluster a group of aligned signal sequences into subgroups containing statistically significant motifs. In order to extract motifs that contain a conserved biochemical property of amino acids in protein sequences, the set of 20 amino acids is further categorized according to their physicochemical properties, e.g. hydrophobicity, charge or molecular size. MDDLogo has been demonstrated to accurately identify the kinase-specific substrate motifs in 1221 human phosphorylation sites associated with seven well-known kinase families from Phospho.ELM. Moreover, in a set of plant phosphorylation data-lacking kinase information, MDDLogo has been applied to help in the investigation of substrate motifs of potential kinases and in the improvement of the identification of plant phosphorylation sites with various substrate specificities. In this study, MDDLogo is comparable with another well-known motif discover tool, Motif-X.

CONTACT

francis@saturn.yzu.edu.tw

摘要

未标记

生物信息学研究通常需要对与特定生物学功能相关的一组序列（例如转录因子结合位点、microRNA 靶位点或蛋白质翻译后修饰位点）进行保守分析。由于在涉及各种信号的大规模序列数据上探索保守基序具有难度，因此开发了一种新方法 MDDLogo。MDDLogo 将最大依赖分解 (MDD) 应用于将一组对齐的信号序列聚类为包含统计上显著基序的子组。为了提取包含蛋白质序列中氨基酸保守生化特性的基序，根据其物理化学性质，例如疏水性、电荷或分子大小，将 20 种氨基酸进一步分类。MDDLogo 已被证明能够准确识别 Phospho.ELM 中来自七个知名激酶家族的 1221 个人类磷酸化位点相关的激酶特异性底物基序。此外，在一组缺乏激酶信息的植物磷酸化数据中，MDDLogo 已被应用于帮助研究潜在激酶的底物基序，并提高各种底物特异性的植物磷酸化位点的识别。在这项研究中，MDDLogo 可与另一个著名的 motif 发现工具 Motif-X 相媲美。

联系方式

francis@saturn.yzu.edu.tw

相似文献

Exploiting maximal dependence decomposition to identify conserved motifs from a group of aligned signal sequences.利用最大依赖分解从一组对齐的信号序列中识别保守基序。

Bioinformatics. 2011 Jul 1;27(13):1780-7. doi: 10.1093/bioinformatics/btr291. Epub 2011 May 6.

MDD-SOH: exploiting maximal dependence decomposition to identify S-sulfenylation sites with substrate motifs.MDD-SOH：利用最大依赖分解来识别具有底物基序的S-亚磺酰化位点。

Bioinformatics. 2016 Jan 15;32(2):165-72. doi: 10.1093/bioinformatics/btv558. Epub 2015 Sep 26.

PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity.PlantPhos：使用最大依赖分解法鉴定具有底物特异性的植物磷酸化位点。

BMC Bioinformatics. 2011 Jun 26;12:261. doi: 10.1186/1471-2105-12-261.

SNOSite: exploiting maximal dependence decomposition to identify cysteine S-nitrosylation with substrate site specificity.SNOSite：利用最大依赖分解鉴定具有底物特异性的半胱氨酸 S-亚硝酰化。

PLoS One. 2011;6(7):e21849. doi: 10.1371/journal.pone.0021849. Epub 2011 Jul 15.

Identifying protein phosphorylation sites with kinase substrate specificity on human viruses.鉴定人类病毒中具有激酶底物特异性的蛋白质磷酸化位点。

PLoS One. 2012;7(7):e40694. doi: 10.1371/journal.pone.0040694. Epub 2012 Jul 23.

UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines.UbiSite：结合具有底物基序的两层机器学习方法来预测赖氨酸上的泛素结合位点。

BMC Syst Biol. 2016 Jan 11;10 Suppl 1(Suppl 1):6. doi: 10.1186/s12918-015-0246-z.

MDD-Palm: Identification of protein S-palmitoylation sites with substrate motifs based on maximal dependence decomposition.MDD-Palm：基于最大依赖分解法识别具有底物基序的蛋白质S-棕榈酰化位点

PLoS One. 2017 Jun 29;12(6):e0179529. doi: 10.1371/journal.pone.0179529. eCollection 2017.

Large-scale identification of phosphorylation sites for profiling protein kinase selectivity.用于分析蛋白激酶选择性的磷酸化位点的大规模鉴定

J Proteome Res. 2014 Jul 3;13(7):3410-9. doi: 10.1021/pr500319y. Epub 2014 Jun 4.

MDD-carb: a combinatorial model for the identification of protein carbonylation sites with substrate motifs.MDD-carb：一种用于识别具有底物基序的蛋白质羰基化位点的组合模型。

BMC Syst Biol. 2017 Dec 21;11(Suppl 7):137. doi: 10.1186/s12918-017-0511-4.

Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites.基于底物结合位点中位置的内在相关性对赖氨酸瓜氨酸化的表征和鉴定。

BMC Bioinformatics. 2019 Feb 4;19(Suppl 13):384. doi: 10.1186/s12859-018-2394-9.

引用本文的文献

A genetic algorithm-based ensemble model for efficiently identifying interleukin 6 inducing peptides.一种基于遗传算法的集成模型，用于高效识别白细胞介素6诱导肽。

Sci Rep. 2025 Jul 1;15(1):21213. doi: 10.1038/s41598-025-05491-2.

PredIL13: Stacking a variety of machine and deep learning methods with ESM-2 language model for identifying IL13-inducing peptides.PredIL13：结合多种机器和深度学习方法以及 ESM-2 语言模型，用于识别诱导 IL13 的肽。

PLoS One. 2024 Aug 22;19(8):e0309078. doi: 10.1371/journal.pone.0309078. eCollection 2024.

ENCAP: Computational prediction of tumor T cell antigens with ensemble classifiers and diverse sequence features.ENCAP：使用集成分类器和多种序列特征进行肿瘤 T 细胞抗原的计算预测。

PLoS One. 2024 Jul 18;19(7):e0307176. doi: 10.1371/journal.pone.0307176. eCollection 2024.

Improved prediction of anti-angiogenic peptides based on machine learning models and comprehensive features from peptide sequences.基于机器学习模型和肽序列综合特征提高抗血管生成肽的预测。

Sci Rep. 2024 Jun 22;14(1):14387. doi: 10.1038/s41598-024-65062-9.

AMPActiPred: A three-stage framework for predicting antibacterial peptides and activity levels with deep forest.AMPActiPred：一种使用深度森林进行抗菌肽预测和活性水平预测的三阶段框架。

Protein Sci. 2024 Jun;33(6):e5006. doi: 10.1002/pro.5006.

StackDPP: a stacking ensemble based DNA-binding protein prediction model.StackDPP：一种基于堆叠集成的 DNA 结合蛋白预测模型。

BMC Bioinformatics. 2024 Mar 14;25(1):111. doi: 10.1186/s12859-024-05714-9.

Prediction of matrilineal specific patatin-like protein governing in-vivo maternal haploid induction in maize using support vector machine and di-peptide composition.利用支持向量机和二肽组成预测玉米体内母性单倍体诱导的母系特异性类脂酶蛋白。

Amino Acids. 2024 Mar 9;56(1):20. doi: 10.1007/s00726-023-03368-0.

Analysis and review of techniques and tools based on machine learning and deep learning for prediction of lysine malonylation sites in protein sequences.基于机器学习和深度学习的赖氨酸丙二酰化位点预测的技术和工具的分析与综述。

Database (Oxford). 2024 Jan 19;2024. doi: 10.1093/database/baad094.

Improving prediction of drug-target interactions based on fusing multiple features with data balancing and feature selection techniques.基于融合多种特征、数据平衡和特征选择技术提高药物-靶标相互作用预测。

PLoS One. 2023 Aug 3;18(8):e0288173. doi: 10.1371/journal.pone.0288173. eCollection 2023.

Identification of intelligence-related proteins through a robust two-layer predictor.通过强大的双层预测器鉴定与智力相关的蛋白质。

Commun Integr Biol. 2022 Nov 15;15(1):253-264. doi: 10.1080/19420889.2022.2143101. eCollection 2022.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用最大依赖分解从一组对齐的信号序列中识别保守基序。

Exploiting maximal dependence decomposition to identify conserved motifs from a group of aligned signal sequences.

机构信息

出版信息

UNLABELLED

CONTACT

未标记

联系方式

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献