• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生物序列模式自动发现的方法。

Approaches to the automatic discovery of patterns in biosequences.

作者信息

Brazma A, Jonassen I, Eidhammer I, Gilbert D

机构信息

EMBL Outstation-Hinxton, European Bioinformatics Institute, Cambridge, UK.

出版信息

J Comput Biol. 1998 Summer;5(2):279-305. doi: 10.1089/cmb.1998.5.279.

DOI:10.1089/cmb.1998.5.279
PMID:9672833
Abstract

This paper surveys approaches to the discovery of patterns in biosequences and places these approaches within a formal framework that systematises the types of patterns and the discovery algorithms. Patterns with expressive power in the class of regular languages are considered, and a classification of pattern languages in this class is developed, covering the patterns that are the most frequently used in molecular bioinformatics. A formulation is given of the problem of the automatic discovery of such patterns from a set of sequences, and an analysis is presented of the ways in which an assessment can be made of the significance of the discovered patterns. It is shown that the problem is related to problems studied in the field of machine learning. The major part of this paper comprises a review of a number of existing methods developed to solve the problem and how these relate to each other, focusing on the algorithms underlying the approaches. A comparison is given of the algorithms, and examples are given of patterns that have been discovered using the different methods.

摘要

本文综述了在生物序列中发现模式的方法,并将这些方法置于一个形式框架内,该框架对模式类型和发现算法进行了系统化。文中考虑了在正则语言类中具有表达能力的模式,并对该类中的模式语言进行了分类,涵盖了分子生物信息学中最常用的模式。给出了从一组序列中自动发现此类模式的问题的表述,并分析了评估所发现模式的显著性的方法。结果表明,该问题与机器学习领域所研究的问题相关。本文的主要部分包括对为解决该问题而开发的一些现有方法及其相互关系的综述,重点关注这些方法背后的算法。对这些算法进行了比较,并给出了使用不同方法发现的模式的示例。

相似文献

1
Approaches to the automatic discovery of patterns in biosequences.生物序列模式自动发现的方法。
J Comput Biol. 1998 Summer;5(2):279-305. doi: 10.1089/cmb.1998.5.279.
2
Structure comparison and structure patterns.结构比较与结构模式。
J Comput Biol. 2000;7(5):685-716. doi: 10.1089/106652701446152.
3
Discovering patterns and subfamilies in biosequences.在生物序列中发现模式和亚家族。
Proc Int Conf Intell Syst Mol Biol. 1996;4:34-43.
4
RegExpBlasting (REB), a Regular Expression Blasting algorithm based on multiply aligned sequences.正则表达式爆破(REB),一种基于多重比对序列的正则表达式爆破算法。
BMC Bioinformatics. 2009 Jun 16;10 Suppl 6(Suppl 6):S5. doi: 10.1186/1471-2105-10-S6-S5.
5
An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments.一种蛋白质序列与结构分析及建模的综合方法。III. 使用多重结构比对对蛋白质结构家族中的序列保守性进行比较研究。
J Mol Biol. 2000 Aug 18;301(3):691-711. doi: 10.1006/jmbi.2000.3975.
6
Efficient functional clustering of protein sequences using the Dirichlet process.使用狄利克雷过程对蛋白质序列进行高效功能聚类。
Bioinformatics. 2008 Aug 15;24(16):1765-71. doi: 10.1093/bioinformatics/btn244. Epub 2008 May 29.
7
Finite width model sequence comparison.有限宽度模型序列比较
Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Aug;70(2 Pt 1):021906. doi: 10.1103/PhysRevE.70.021906. Epub 2004 Aug 17.
8
Distributed sequence alignment applications for the public computing architecture.面向公共计算架构的分布式序列比对应用程序。
IEEE Trans Nanobioscience. 2008 Mar;7(1):35-43. doi: 10.1109/TNB.2008.2000148.
9
Automated generation of heuristics for biological sequence comparison.用于生物序列比较的启发式方法的自动生成。
BMC Bioinformatics. 2005 Feb 15;6:31. doi: 10.1186/1471-2105-6-31.
10
Sequence alignment in molecular biology.
J Comput Biol. 1998 Summer;5(2):173-96. doi: 10.1089/cmb.1998.5.173.

引用本文的文献

1
Sequence information gain based motif analysis.基于序列信息增益的基序分析
BMC Bioinformatics. 2015 Nov 9;16:377. doi: 10.1186/s12859-015-0811-x.
2
Direct vs 2-stage approaches to structured motif finding.用于结构化基序发现的直接方法与两阶段方法
Algorithms Mol Biol. 2012 Aug 21;7(1):20. doi: 10.1186/1748-7188-7-20.
3
IP6K gene identification in plant genomes by tag searching.
BMC Proc. 2011 May 28;5 Suppl 2(Suppl 2):S1. doi: 10.1186/1753-6561-5-S2-S1.
4
BrEPS: a flexible and automatic protocol to compute enzyme-specific sequence profiles for functional annotation.BrEPS:一种用于计算酶特异性序列轮廓以进行功能注释的灵活自动协议。
BMC Bioinformatics. 2010 Dec 1;11:589. doi: 10.1186/1471-2105-11-589.
5
Variable structure motifs for transcription factor binding sites.转录因子结合位点的变构基序。
BMC Genomics. 2010 Jan 14;11:30. doi: 10.1186/1471-2164-11-30.
6
Integrating sequence, evolution and functional genomics in regulatory genomics.在调控基因组学中整合序列、进化和功能基因组学。
Genome Biol. 2009;10(1):202. doi: 10.1186/gb-2009-10-1-202. Epub 2009 Jan 30.
7
AMYPdb: a database dedicated to amyloid precursor proteins.AMYPdb:一个专注于淀粉样前体蛋白的数据库。
BMC Bioinformatics. 2008 Jun 10;9:273. doi: 10.1186/1471-2105-9-273.
8
MAGIIC-PRO: detecting functional signatures by efficient discovery of long patterns in protein sequences.MAGIIC-PRO:通过高效发现蛋白质序列中的长模式来检测功能特征。
Nucleic Acids Res. 2008 Mar;36(4):1400-6. doi: 10.1093/nar/gkm717.
9
Evaluation of phylogenetic footprint discovery for predicting bacterial cis-regulatory elements and revealing their evolution.用于预测细菌顺式调控元件并揭示其进化的系统发育足迹发现评估。
BMC Bioinformatics. 2008 Jan 23;9:37. doi: 10.1186/1471-2105-9-37.
10
Evaluating deterministic motif significance measures in protein databases.评估蛋白质数据库中确定性基序显著性度量
Algorithms Mol Biol. 2007 Dec 24;2:16. doi: 10.1186/1748-7188-2-16.