• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

收敛岛统计:一种确定局部比对得分显著性的快速方法。

Convergent Island Statistics: a fast method for determining local alignment score significance.

作者信息

Poleksic Aleksandar, Danzer Joseph F, Hambly Kevin, Debe Derek A

机构信息

Eidogen-Sertanty Inc., 9381 Judicial Dr., San Diego, CA 92121, USA.

出版信息

Bioinformatics. 2005 Jun 15;21(12):2827-31. doi: 10.1093/bioinformatics/bti433. Epub 2005 Apr 7.

DOI:10.1093/bioinformatics/bti433
PMID:15817690
Abstract

MOTIVATION

Background distribution statistics for profile-based sequence alignment algorithms cannot be calculated analytically, and hence such algorithms must resort to measuring the significance of an alignment score by assessing its location among a distribution of background alignment scores. The Gumbel parameters that describe this background distribution are usually pre-computed for a limited number of scoring systems, gap schemes, and sequence lengths and compositions. The use of such look-ups is known to introduce errors, which compromise the significance assessment of a remote homology relationship. One solution is to estimate the background distribution for each pair of interest by generating a large number of sequence shuffles and use the distribution of their scores to approximate the parameters of the underlying extreme value distribution. This is computationally very expensive, as a large number of shuffles are needed to precisely estimate the score statistics.

RESULTS

Convergent Island Statistics (CIS) is a computationally efficient solution to the problem of calculating the Gumbel distribution parameters for an arbitrary pair of sequences and an arbitrary set of gap and scoring schemes. The basic idea behind our method is to recognize the lack of similarity for any pair of sequences early in the shuffling process and thus save on the search time. The method is particularly useful in the context of profile-profile alignment algorithms where the normalization of alignment scores has traditionally been a challenging task.

CONTACT

aleksandar@eidogen.com

SUPPLEMENTARY INFORMATION

http://www.eidogen-sertanty.com/Documents/convergent_island_stats_sup.pdf.

摘要

动机

基于轮廓的序列比对算法的背景分布统计无法通过解析计算得出,因此此类算法必须通过评估比对分数在背景比对分数分布中的位置来衡量其显著性。描述此背景分布的耿贝尔参数通常是针对有限数量的评分系统、空位方案以及序列长度和组成预先计算的。已知使用此类查找会引入误差,这会损害远源同源关系的显著性评估。一种解决方案是通过生成大量序列重排来估计每对感兴趣序列的背景分布,并使用它们的分数分布来近似潜在极值分布的参数。这在计算上非常昂贵,因为需要大量重排才能精确估计分数统计量。

结果

收敛岛统计(CIS)是一种计算高效的解决方案,用于计算任意一对序列以及任意空位和评分方案集的耿贝尔分布参数。我们方法背后的基本思想是在重排过程早期识别任意一对序列之间缺乏相似性,从而节省搜索时间。该方法在轮廓-轮廓比对算法的背景下特别有用,在这种算法中,比对分数的归一化传统上是一项具有挑战性的任务。

联系方式

aleksandar@eidogen.com

补充信息

http://www.eidogen-sertanty.com/Documents/convergent_island_stats_sup.pdf 。

相似文献

1
Convergent Island Statistics: a fast method for determining local alignment score significance.收敛岛统计:一种确定局部比对得分显著性的快速方法。
Bioinformatics. 2005 Jun 15;21(12):2827-31. doi: 10.1093/bioinformatics/bti433. Epub 2005 Apr 7.
2
A statistical method for alignment-free comparison of regulatory sequences.一种用于调控序列无比对比较的统计方法。
Bioinformatics. 2007 Jul 1;23(13):i249-55. doi: 10.1093/bioinformatics/btm211.
3
STRUCTFAST: protein sequence remote homology detection and alignment using novel dynamic programming and profile-profile scoring.STRUCTFAST:利用新型动态规划和轮廓-轮廓评分进行蛋白质序列远程同源性检测与比对。
Proteins. 2006 Sep 1;64(4):960-7. doi: 10.1002/prot.21049.
4
Prediction of Ras-effector interactions using position energy matrices.使用位置能量矩阵预测Ras效应器相互作用。
Bioinformatics. 2007 Sep 1;23(17):2226-30. doi: 10.1093/bioinformatics/btm336. Epub 2007 Jun 28.
5
Calibrating E-values for hidden Markov models using reverse-sequence null models.使用反向序列空模型校准隐马尔可夫模型的E值。
Bioinformatics. 2005 Nov 15;21(22):4107-15. doi: 10.1093/bioinformatics/bti629. Epub 2005 Aug 25.
6
Designing patterns for profile HMM search.设计用于隐马尔可夫模型轮廓搜索的模式。
Bioinformatics. 2007 Jan 15;23(2):e36-43. doi: 10.1093/bioinformatics/btl323.
7
Prediction of functional specificity determinants from protein sequences using log-likelihood ratios.利用对数似然比从蛋白质序列预测功能特异性决定因素。
Bioinformatics. 2006 Jan 15;22(2):164-71. doi: 10.1093/bioinformatics/bti766. Epub 2005 Nov 8.
8
Quasi-consensus-based comparison of profile hidden Markov models for protein sequences.基于准共识的蛋白质序列轮廓隐马尔可夫模型比较
Bioinformatics. 2005 May 15;21(10):2287-93. doi: 10.1093/bioinformatics/bti374. Epub 2005 Mar 29.
9
Incremental window-based protein sequence alignment algorithms.基于窗口递增的蛋白质序列比对算法。
Bioinformatics. 2007 Jan 15;23(2):e17-23. doi: 10.1093/bioinformatics/btl297.
10
Use of multiple profiles corresponding to a sequence alignment enables effective detection of remote homologues.使用与序列比对相对应的多个图谱能够有效地检测远源同源物。
Bioinformatics. 2005 Jun 15;21(12):2821-6. doi: 10.1093/bioinformatics/bti432. Epub 2005 Apr 7.

引用本文的文献

1
Accelerating pairwise statistical significance estimation for local alignment by harvesting GPU's power.利用 GPU 加速局部比对的成对统计显著性估计。
BMC Bioinformatics. 2012 Apr 12;13 Suppl 5(Suppl 5):S3. doi: 10.1186/1471-2105-13-S5-S3.
2
Island method for estimating the statistical significance of profile-profile alignment scores.用于估计序列轮廓与序列轮廓比对得分统计显著性的岛方法。
BMC Bioinformatics. 2009 Apr 20;10:112. doi: 10.1186/1471-2105-10-112.
3
Ligand-binding pocket shape differences between sphingosine 1-phosphate (S1P) receptors S1P1 and S1P3 determine efficiency of chemical probe identification by ultrahigh-throughput screening.
1-磷酸鞘氨醇(S1P)受体S1P1和S1P3之间配体结合口袋形状的差异决定了通过超高通量筛选鉴定化学探针的效率。
ACS Chem Biol. 2008 Aug 15;3(8):486-98. doi: 10.1021/cb800051m. Epub 2008 Jul 1.