• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一个用于处理序列同源性数据的专家系统。

An expert system for processing sequence homology data.

作者信息

Sonnhammer E L, Durbin R

机构信息

Sanger Centre, Hinxton, Cambridge, UK.

出版信息

Proc Int Conf Intell Syst Mol Biol. 1994;2:363-8.

PMID:7584413
Abstract

When confronted with the task of finding homology to large numbers of sequences, database searching tools such as Blast and Fasta generate prohibitively large amounts of information. An automatic way of making most of the decisions a trained sequence analyst would make was developed by means of a rule-based expert system combined with an algorithm to avoid non-informative biased residue composition matches. The results found relevant by the system are presented in a very concise and clear way, so that the homology can be assessed with minimum effort. The expert system, HSPcrunch, was implemented to process the output to the programs in the BLAST suite. HSPcrunch embodies rules on detecting distant similarities when pairs of weak matches are consistent with a larger gapped alignment, i.e. when Blast has broken a longer gapped alignment up into smaller ungapped ones. This way, more distant similarities can be detected with no or little side-effects of more spurious matches. The rules for how small the gaps must be to be considered significant have been derived empirically. Currently a set of rules are used that operate on two different scoring levels, one for very weak matches that have very small gaps and one for medium weak matches that have slightly larger gaps. This set of rules proved to be robust for most cases and gives high fidelity separation between real homologies and spurious matches. One of the most important rules for reducing the amount of output is to limit the number of overlapping matches to the same region of the query sequence.(ABSTRACT TRUNCATED AT 250 WORDS)

摘要

当面临寻找与大量序列的同源性这一任务时,诸如Blast和Fasta等数据库搜索工具会生成数量多得令人望而却步的信息。借助基于规则的专家系统与一种算法相结合的方式,开发出了一种自动做出训练有素的序列分析师会做出的大多数决策的方法,该算法可避免无信息的偏向性残基组成匹配。系统发现的相关结果以非常简洁明了的方式呈现,这样同源性就能以最小的工作量进行评估。专家系统HSPcrunch被用于处理BLAST套件中程序的输出。HSPcrunch体现了一些规则,当弱匹配对与更大的缺口比对一致时,即当Blast将更长的缺口比对分解成更小的无缺口比对时,可检测到远距离相似性。通过这种方式,可以检测到更远的相似性,同时不会产生或很少产生更多虚假匹配的副作用。关于缺口必须小到何种程度才被视为显著的规则是通过经验得出的。目前使用了一组在两个不同评分水平上运行的规则,一个用于缺口非常小的非常弱匹配,另一个用于缺口稍大的中等弱匹配。这组规则在大多数情况下都很稳健,能在真实同源性和虚假匹配之间实现高保真度的区分。减少输出量的最重要规则之一是限制与查询序列同一区域的重叠匹配数量。(摘要截短为250字)

相似文献

1
An expert system for processing sequence homology data.一个用于处理序列同源性数据的专家系统。
Proc Int Conf Intell Syst Mol Biol. 1994;2:363-8.
2
A workbench for large-scale sequence homology analysis.用于大规模序列同源性分析的工作台。
Comput Appl Biosci. 1994 Jun;10(3):301-7. doi: 10.1093/bioinformatics/10.3.301.
3
FLASH: a fast look-up algorithm for string homology.FLASH:一种用于字符串同源性的快速查找算法。
Proc Int Conf Intell Syst Mol Biol. 1993;1:56-64.
4
BLAST and FASTA similarity searching for multiple sequence alignment.用于多序列比对的BLAST和FASTA相似性搜索。
Methods Mol Biol. 2014;1079:75-101. doi: 10.1007/978-1-62703-646-7_5.
5
Finding protein and nucleotide similarities with FASTA.使用FASTA查找蛋白质和核苷酸的相似性。
Curr Protoc Bioinformatics. 2004 Feb;Chapter 3:Unit3.9. doi: 10.1002/0471250953.bi0309s04.
6
TruMatch--a BLAST post-processor that identifies bona fide sequence matches to genome assemblies.TruMatch——一种BLAST后处理器,用于识别与基因组组装的真正序列匹配。
Bioinformatics. 2005 May 1;21(9):2097-8. doi: 10.1093/bioinformatics/bti257. Epub 2005 Jan 25.
7
FASTA-SWAP and FASTA-PAT: pattern database searches using combinations of aligned amino acids, and a novel scoring theory.FASTA-SWAP和FASTA-PAT:使用比对氨基酸组合进行模式数据库搜索以及一种新颖的评分理论。
J Mol Biol. 1996 Jun 21;259(4):840-54. doi: 10.1006/jmbi.1996.0362.
8
H-tuple approach to evaluate statistical significance of biological sequence comparison with gaps.用于评估带空位的生物序列比较统计学显著性的H元组方法。
Stat Appl Genet Mol Biol. 2007;6:Article 22. doi: 10.2202/1544-6115.1272. Epub 2007 Aug 25.
9
Improved gapped alignment in BLAST.BLAST中改进的空位比对。
IEEE/ACM Trans Comput Biol Bioinform. 2004 Jul-Sep;1(3):116-29. doi: 10.1109/TCBB.2004.32.
10
Using progressive methods for global multiple sequence alignment.使用渐进方法进行全局多序列比对。
Cold Spring Harb Protoc. 2009 Jul;2009(7):pdb.top43. doi: 10.1101/pdb.top43.

引用本文的文献

1
Genome sequence of Haloarcula marismortui: a halophilic archaeon from the Dead Sea.死海嗜盐古菌——盐沼盐杆菌的基因组序列
Genome Res. 2004 Nov;14(11):2221-34. doi: 10.1101/gr.2700304.
2
A genome annotation-driven approach to cloning the human ORFeome.一种基于基因组注释的人类开放阅读框组克隆方法。
Genome Biol. 2004;5(10):R84. doi: 10.1186/gb-2004-5-10-r84. Epub 2004 Sep 30.
3
A Fugu-Human Genome Synteny Viewer: web software for graphical display and annotation reports of synteny between Fugu genomic sequence and human genes.
一种河豚-人类基因组共线性查看器:用于图形化显示和注释河豚基因组序列与人类基因之间共线性报告的网络软件。
Nucleic Acids Res. 2004 May 11;32(8):2618-22. doi: 10.1093/nar/gkh573. Print 2004.
4
An optimized set of human telomere clones for studying telomere integrity and architecture.一组经过优化的用于研究端粒完整性和结构的人类端粒克隆。
Am J Hum Genet. 2000 Aug;67(2):320-32. doi: 10.1086/302998. Epub 2000 Jun 22.
5
BioViews: Java-based tools for genomic data visualization.BioViews:用于基因组数据可视化的基于Java的工具。
Genome Res. 1998 Mar;8(3):291-305. doi: 10.1101/gr.8.3.291.
6
Characterization of short tandem repeats from thirty-one human telomeres.来自31个人类端粒的短串联重复序列的特征分析。
Genome Res. 1997 Sep;7(9):917-23. doi: 10.1101/gr.7.9.917.