• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

权重矩阵与替换矩阵之间的关系:基于相似性的基序搜索。

Relation between weight matrix and substitution matrix: motif search by similarity.

作者信息

Zheng Wei-Mou

机构信息

Institute of Theoretical Physics, Academia Sinica Beijing 100080, China.

出版信息

Bioinformatics. 2005 Apr 1;21(7):938-43. doi: 10.1093/bioinformatics/bti090. Epub 2004 Oct 28.

DOI:10.1093/bioinformatics/bti090
PMID:15514002
Abstract

MOTIVATION

The discovery of patterns shared by several sequences that differ greatly is a basic task in sequence analysis, and still a challenge. Several methods have been developed for detecting patterns. Methods commonly used for motif search include the Gibbs sampler, Expectation-Maximization (EM) algorithm and some intuitive greedy approaches. One cannot guarantee the optimality of the result produced by the Gibbs sampler in a single run. The deterministic EM methods tend to get trapped by local optima. Solutions found by greedy approaches are rarely sufficiently good.

RESULTS

A simple model describing a motif or a portion of local multiple sequence alignment is the weight matrix model, in which a motif is characterized with position-specific probabilities. Two substitution matrices are proposed to relate the sequence similarity with the weight matrix. Combining the substitution matrix and weight matrix, we examine three typical sets of protein sequences with increasing complexity. At a low score threshold for pair similarity, sliding windows are compared with a seed window to find the score sum, which provides a measure of statistical significance for multiple sequence comparison. Such a similarity analysis reveals many aspects of motifs. Blocks determined by similarity can be used to deduce a primary weight matrix or an improved substitution matrix. The algorithm successfully obtains the optimal solution for the test sets by just greedy iteration.

摘要

动机

发现几个差异很大的序列所共有的模式是序列分析中的一项基本任务,但仍然是一个挑战。已经开发了几种用于检测模式的方法。常用于基序搜索的方法包括吉布斯采样器、期望最大化(EM)算法和一些直观的贪心方法。单次运行吉布斯采样器无法保证其产生结果的最优性。确定性的EM方法容易陷入局部最优。贪心方法找到的解决方案很少足够好。

结果

描述基序或局部多序列比对一部分的一个简单模型是权重矩阵模型,其中基序由位置特异性概率来表征。提出了两个替换矩阵来关联序列相似性和权重矩阵。结合替换矩阵和权重矩阵,我们研究了三组复杂度不断增加的典型蛋白质序列。在成对相似性的低得分阈值下,将滑动窗口与种子窗口进行比较以找到得分总和,这为多序列比较提供了统计显著性的一种度量。这样的相似性分析揭示了基序的许多方面。由相似性确定的模块可用于推导一个主要的权重矩阵或一个改进的替换矩阵。该算法仅通过贪心迭代就成功地为测试集获得了最优解。

相似文献

1
Relation between weight matrix and substitution matrix: motif search by similarity.权重矩阵与替换矩阵之间的关系:基于相似性的基序搜索。
Bioinformatics. 2005 Apr 1;21(7):938-43. doi: 10.1093/bioinformatics/bti090. Epub 2004 Oct 28.
2
Motif-based protein ranking by network propagation.基于网络传播的基序蛋白排序
Bioinformatics. 2005 Oct 1;21(19):3711-8. doi: 10.1093/bioinformatics/bti608. Epub 2005 Aug 2.
3
Designing patterns for profile HMM search.设计用于隐马尔可夫模型轮廓搜索的模式。
Bioinformatics. 2007 Jan 15;23(2):e36-43. doi: 10.1093/bioinformatics/btl323.
4
A profile-based deterministic sequential Monte Carlo algorithm for motif discovery.一种基于轮廓的确定性序贯蒙特卡罗基序发现算法。
Bioinformatics. 2008 Jan 1;24(1):46-55. doi: 10.1093/bioinformatics/btm543. Epub 2007 Nov 17.
5
Rapid motif-based prediction of circular permutations in multi-domain proteins.基于基序的多结构域蛋白质中环形排列的快速预测
Bioinformatics. 2005 Apr 1;21(7):932-7. doi: 10.1093/bioinformatics/bti085.
6
DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment.DIALIGN-T:一种改进的基于片段的多序列比对算法。
BMC Bioinformatics. 2005 Mar 22;6:66. doi: 10.1186/1471-2105-6-66.
7
A metric model of amino acid substitution.氨基酸取代的度量模型。
Bioinformatics. 2004 May 22;20(8):1214-21. doi: 10.1093/bioinformatics/bth065. Epub 2004 Feb 10.
8
ARCS-Motif: discovering correlated motifs from unaligned biological sequences.ARCS基序:从未比对的生物序列中发现相关基序。
Bioinformatics. 2009 Jan 15;25(2):183-9. doi: 10.1093/bioinformatics/btn609. Epub 2008 Dec 9.
9
Searching for three-dimensional secondary structural patterns in proteins with ProSMoS.使用ProSMoS搜索蛋白质中的三维二级结构模式。
Bioinformatics. 2007 Jun 1;23(11):1331-8. doi: 10.1093/bioinformatics/btm121. Epub 2007 Mar 24.
10
HMM-ModE--improved classification using profile hidden Markov models by optimising the discrimination threshold and modifying emission probabilities with negative training sequences.HMM-ModE——通过优化判别阈值并利用负训练序列修改发射概率,使用轮廓隐马尔可夫模型改进分类。
BMC Bioinformatics. 2007 Mar 27;8:104. doi: 10.1186/1471-2105-8-104.