• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于蛋白质序列模式识别的自组织层次网络

Self-organizing hierarchic networks for pattern recognition in protein sequence.

作者信息

Hanke J, Beckmann G, Bork P, Reich J G

机构信息

Max-Delbrück-Center for Molecular Medicine, Department of Bioinformatics, Berlin-Buch, Germany.

出版信息

Protein Sci. 1996 Jan;5(1):72-82. doi: 10.1002/pro.5560050109.

DOI:10.1002/pro.5560050109
PMID:8771198
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2143234/
Abstract

We present a method based on hierarchical self-organizing maps (SOMs) for recognizing patterns in protein sequences. The method is fully automatic, does not require prealigned sequences, is insensitive to redundancy in the training set, and works surprisingly well even with small learning sets. Because it uses unsupervised neural networks, it is able to extract patterns that are not present in all of the unaligned sequences of the learning set. The identification of these patterns in sequence databases is sensitive and efficient. The procedure comprises three main training stages. In the first stage, one SOM is trained to extract common features from the set of unaligned learning sequences. A feature is a number of ungapped sequence segments (usually 4-16 residues long) that are similar to segments in most of the sequences of the learning set according to an initial similarity matrix. In the second training stage, the recognition of each individual feature is refined by selecting an optimal weighting matrix out of a variety of existing amino acid similarity matrices. In a third stage of the SOM procedure, the position of the features in the individual sequences is learned. This allows for variants with feature repeats and feature shuffling. The procedure has been successfully applied to a number of notoriously difficult cases with distinct recognition problems: helix-turn-helix motifs in DNA-binding proteins, the CUB domain of developmentally regulated proteins, and the superfamily of ribokinases. A comparison with the established database search procedure PROFILE (and with several others) led to the conclusion that the new automatic method performs satisfactorily.

摘要

我们提出了一种基于分层自组织映射(SOM)的方法来识别蛋白质序列中的模式。该方法完全自动化,不需要预先比对的序列,对训练集中的冗余不敏感,即使学习集较小也能取得惊人的良好效果。由于它使用无监督神经网络,所以能够提取学习集中所有未比对序列中不存在的模式。在序列数据库中识别这些模式既灵敏又高效。该过程包括三个主要训练阶段。在第一阶段,训练一个SOM从未比对的学习序列集中提取共同特征。一个特征是一些无间隙的序列片段(通常4 - 16个残基长),根据初始相似性矩阵,这些片段与学习集中大多数序列中的片段相似。在第二个训练阶段,通过从各种现有的氨基酸相似性矩阵中选择一个最优加权矩阵来细化对每个单独特征的识别。在SOM过程的第三个阶段,学习特征在各个序列中的位置。这允许存在具有特征重复和特征重排的变体。该过程已成功应用于一些存在明显识别问题的极其困难的案例:DNA结合蛋白中的螺旋-转角-螺旋基序、发育调控蛋白的CUB结构域以及核糖激酶超家族。与既定的数据库搜索程序PROFILE(以及其他几个程序)的比较得出结论,新的自动方法表现令人满意。

相似文献

1
Self-organizing hierarchic networks for pattern recognition in protein sequence.用于蛋白质序列模式识别的自组织层次网络
Protein Sci. 1996 Jan;5(1):72-82. doi: 10.1002/pro.5560050109.
2
Detecting patterns in protein sequences.检测蛋白质序列中的模式。
J Mol Biol. 1994 Jun 24;239(5):698-712. doi: 10.1006/jmbi.1994.1407.
3
Temporally asymmetric learning supports sequence processing in multi-winner self-organizing maps.时间不对称学习支持多赢家自组织映射中的序列处理。
Neural Comput. 2004 Mar;16(3):535-61. doi: 10.1162/089976604772744901.
4
Self-organized neural maps of human protein sequences.人类蛋白质序列的自组织神经图谱。
Protein Sci. 1994 Mar;3(3):507-21. doi: 10.1002/pro.5560030316.
5
Finding flexible patterns in unaligned protein sequences.在未比对的蛋白质序列中寻找灵活模式。
Protein Sci. 1995 Aug;4(8):1587-95. doi: 10.1002/pro.5560040817.
6
Large scale analysis of protein-binding cavities using self-organizing maps and wavelet-based surface patches to describe functional properties, selectivity discrimination, and putative cross-reactivity.使用自组织映射和基于小波的表面补丁对蛋白质结合腔进行大规模分析,以描述功能特性、选择性识别和假定的交叉反应性。
Proteins. 2008 May 15;71(3):1288-306. doi: 10.1002/prot.21823.
7
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
8
Amino acid recognition by Venus flytrap domains is encoded in an 8-residue motif.捕蝇草结构域对氨基酸的识别由一个8个残基的基序编码。
Biopolymers. 2005;80(2-3):357-66. doi: 10.1002/bip.20229.
9
Recognition of multiple patterns in unaligned sets of sequences: comparison of kernel clustering method with other methods.未对齐序列集中多种模式的识别:核聚类方法与其他方法的比较。
Bioinformatics. 2004 Jul 10;20(10):1512-6. doi: 10.1093/bioinformatics/bth111.
10
Self-organizing neural networks to support the discovery of DNA-binding motifs.支持发现DNA结合基序的自组织神经网络。
Neural Netw. 2006 Jul-Aug;19(6-7):950-62. doi: 10.1016/j.neunet.2006.05.023. Epub 2006 Jul 12.

本文引用的文献

1
The CUB domain. A widespread module in developmentally regulated proteins.CUB结构域。发育调控蛋白中广泛存在的一种模块。
J Mol Biol. 1993 May 20;231(2):539-45. doi: 10.1006/jmbi.1993.1305.
2
Convergent evolution of similar enzymatic function on different protein folds: the hexokinase, ribokinase, and galactokinase families of sugar kinases.不同蛋白质折叠上相似酶功能的趋同进化:糖激酶的己糖激酶、核糖激酶和半乳糖激酶家族
Protein Sci. 1993 Jan;2(1):31-40. doi: 10.1002/pro.5560020104.
3
The PROSITE dictionary of sites and patterns in proteins, its current status.蛋白质位点与模式的PROSITE字典及其当前状态。
Nucleic Acids Res. 1993 Jul 1;21(13):3097-103. doi: 10.1093/nar/21.13.3097.
4
The SWISS-PROT protein sequence data bank, recent developments.SWISS-PROT蛋白质序列数据库,最新进展。
Nucleic Acids Res. 1993 Jul 1;21(13):3093-6. doi: 10.1093/nar/21.13.3093.
5
Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment.检测细微序列信号:一种用于多重比对的吉布斯采样策略。
Science. 1993 Oct 8;262(5131):208-14. doi: 10.1126/science.8211139.
6
Combining evolutionary information and neural networks to predict protein secondary structure.结合进化信息与神经网络预测蛋白质二级结构。
Proteins. 1994 May;19(1):55-72. doi: 10.1002/prot.340190108.
7
Detecting patterns in protein sequences.检测蛋白质序列中的模式。
J Mol Biol. 1994 Jun 24;239(5):698-712. doi: 10.1006/jmbi.1994.1407.
8
Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks.蛋白质中保守片段的检测:使用比对模块对序列数据库进行迭代扫描。
Proc Natl Acad Sci U S A. 1994 Dec 6;91(25):12091-5. doi: 10.1073/pnas.91.25.12091.
9
Improving the sensitivity of the sequence profile method.提高序列概况方法的灵敏度。
Protein Sci. 1994 Jan;3(1):139-46. doi: 10.1002/pro.5560030118.
10
Multiple sequence alignment.多序列比对
J Mol Biol. 1986 Sep 20;191(2):153-61. doi: 10.1016/0022-2836(86)90252-4.