• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

局部比对得分分布的新有限尺寸校正。

New finite-size correction for local alignment score distributions.

作者信息

Park Yonil, Sheetlin Sergey, Ma Ning, Madden Thomas L, Spouge John L

机构信息

National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA.

出版信息

BMC Res Notes. 2012 Jun 12;5:286. doi: 10.1186/1756-0500-5-286.

DOI:10.1186/1756-0500-5-286
PMID:22691307
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3483159/
Abstract

BACKGROUND

Local alignment programs often calculate the probability that a match occurred by chance. The calculation of this probability may require a "finite-size" correction to the lengths of the sequences, as an alignment that starts near the end of either sequence may run out of sequence before achieving a significant score.

FINDINGS

We present an improved finite-size correction that considers the distribution of sequence lengths rather than simply the corresponding means. This approach improves sensitivity and avoids substituting an ad hoc length for short sequences that can underestimate the significance of a match. We use a test set derived from ASTRAL to show improved ROC scores, especially for shorter sequences.

CONCLUSIONS

The new finite-size correction improves the calculation of probabilities for a local alignment. It is now used in the BLAST+ package and at the NCBI BLAST web site ( http://blast.ncbi.nlm.nih.gov).

摘要

背景

局部比对程序常常计算匹配偶然发生的概率。此概率的计算可能需要对序列长度进行“有限大小”校正,因为在任一序列末端附近开始的比对在获得显著分数之前可能会超出序列范围。

研究结果

我们提出了一种改进的有限大小校正方法,该方法考虑序列长度的分布而非仅仅是相应的平均值。这种方法提高了灵敏度,并且避免了用一个特设长度替代短序列,因为这可能会低估匹配的显著性。我们使用从ASTRAL派生的测试集来展示改进的ROC分数,特别是对于较短的序列。

结论

新的有限大小校正改进了局部比对概率的计算。它现在用于BLAST+软件包以及NCBI BLAST网站(http://blast.ncbi.nlm.nih.gov)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4cf/3483159/66e6886d72b2/1756-0500-5-286-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4cf/3483159/dacbe8817d30/1756-0500-5-286-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4cf/3483159/8851be1943aa/1756-0500-5-286-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4cf/3483159/66e6886d72b2/1756-0500-5-286-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4cf/3483159/dacbe8817d30/1756-0500-5-286-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4cf/3483159/8851be1943aa/1756-0500-5-286-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4cf/3483159/66e6886d72b2/1756-0500-5-286-3.jpg

相似文献

1
New finite-size correction for local alignment score distributions.局部比对得分分布的新有限尺寸校正。
BMC Res Notes. 2012 Jun 12;5:286. doi: 10.1186/1756-0500-5-286.
2
The correlation error and finite-size correction in an ungapped sequence alignment.无间隙序列比对中的相关误差和有限尺寸校正。
Bioinformatics. 2002 Sep;18(9):1236-42. doi: 10.1093/bioinformatics/18.9.1236.
3
BLAST: improvements for better sequence analysis.BLAST:用于更好序列分析的改进方法。
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W6-9. doi: 10.1093/nar/gkl164.
4
Domain enhanced lookup time accelerated BLAST.基于域名的快速检索 BLAST。
Biol Direct. 2012 Apr 17;7:12. doi: 10.1186/1745-6150-7-12.
5
Genomic BLAST: custom-defined virtual databases for complete and unfinished genomes.基因组BLAST:用于完整和未完成基因组的自定义虚拟数据库。
FEMS Microbiol Lett. 2002 Nov 5;216(2):133-8. doi: 10.1111/j.1574-6968.2002.tb11426.x.
6
BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences.BLAST 2序列,一种用于比较蛋白质和核苷酸序列的新工具。
FEMS Microbiol Lett. 1999 May 15;174(2):247-50. doi: 10.1111/j.1574-6968.1999.tb13575.x.
7
BLAST: a more efficient report with usability improvements.BLAST:提高了可用性的更高效报告。
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W29-33. doi: 10.1093/nar/gkt282. Epub 2013 Apr 22.
8
Searching for evolutionary distant RNA homologs within genomic sequences using partition function posterior probabilities.利用配分函数后验概率在基因组序列中搜索进化距离较远的RNA同源物。
BMC Bioinformatics. 2008 Jan 28;9:61. doi: 10.1186/1471-2105-9-61.
9
Score distributions of gapped multiple sequence alignments down to the low-probability tail.有空隙的多重序列比对的分数分布到低概率尾部。
Phys Rev E. 2016 Aug;94(2-1):022127. doi: 10.1103/PhysRevE.94.022127. Epub 2016 Aug 19.
10
Estimation of P-values for global alignments of protein sequences.蛋白质序列全局比对的P值估计。
Bioinformatics. 2001 Dec;17(12):1158-67. doi: 10.1093/bioinformatics/17.12.1158.

引用本文的文献

1
iSeqSearch: incremental protein search for iBlast/iMMSeqs2/iDiamond.iSeqSearch:用于iBlast/iMMSeqs2/iDiamond的增量蛋白质搜索
PeerJ. 2025 Apr 28;13:e19171. doi: 10.7717/peerj.19171. eCollection 2025.
2
Computational Methods for the Discovery and Optimization of TAAR1 and TAAR5 Ligands.TAAR1 和 TAAR5 配体的发现和优化的计算方法。
Int J Mol Sci. 2024 Jul 27;25(15):8226. doi: 10.3390/ijms25158226.
3
A fast comparative genome browser for diverse bacteria and archaea.一个快速的比较基因组浏览器,用于各种细菌和古菌。

本文引用的文献

1
Maximum-likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores.史密斯-沃特曼局部序列相似性得分统计分布的最大似然估计。
Bull Math Biol. 1992 Jan;54(1):59-75. doi: 10.1007/BF02458620.
2
Objective method for estimating asymptotic parameters, with an application to sequence alignment.估计渐近参数的客观方法及其在序列比对中的应用。
Phys Rev E Stat Nonlin Soft Matter Phys. 2011 Sep;84(3 Pt 1):031914. doi: 10.1103/PhysRevE.84.031914. Epub 2011 Sep 13.
3
Where does the alignment score distribution shape come from?
PLoS One. 2024 Apr 9;19(4):e0301871. doi: 10.1371/journal.pone.0301871. eCollection 2024.
4
Coupled binding and folding of disordered SPIN N-terminal region in myeloperoxidase inhibition.髓过氧化物酶抑制中无序SPIN N端区域的偶联结合与折叠
Front Mol Biosci. 2023 Feb 9;10:1130189. doi: 10.3389/fmolb.2023.1130189. eCollection 2023.
5
Soil microbial communities shift along an urban gradient in Berlin, Germany.德国柏林的土壤微生物群落沿城市梯度发生变化。
Front Microbiol. 2022 Aug 12;13:972052. doi: 10.3389/fmicb.2022.972052. eCollection 2022.
6
iBLAST: Incremental BLAST of new sequences via automated e-value correction.iBLAST:通过自动 e 值校正对新序列进行增量 BLAST。
PLoS One. 2021 Apr 22;16(4):e0249410. doi: 10.1371/journal.pone.0249410. eCollection 2021.
7
A structural model of the anaphase promoting complex co-activator (Cdh1) and in silico design of inhibitory compounds.后期促进复合物共激活因子(Cdh1)的结构模型及抑制性化合物的计算机辅助设计
Res Pharm Sci. 2015 Jan-Feb;10(1):59-67.
8
ALP & FALP: C++ libraries for pairwise local alignment E-values.ALP和FALP:用于成对局部比对E值的C++库。
Bioinformatics. 2016 Jan 15;32(2):304-5. doi: 10.1093/bioinformatics/btv575. Epub 2015 Oct 1.
9
Frameshift alignment: statistics and post-genomic applications.移码校正:统计与后基因组学应用。
Bioinformatics. 2014 Dec 15;30(24):3575-82. doi: 10.1093/bioinformatics/btu576. Epub 2014 Aug 28.
10
Evolutionary history of chordate PAX genes: dynamics of change in a complex gene family.脊索动物 PAX 基因的进化史:复杂基因家族变化的动态。
PLoS One. 2013 Sep 2;8(9):e73560. doi: 10.1371/journal.pone.0073560. eCollection 2013.
对齐得分分布的形状来自哪里?
Evol Bioinform Online. 2010 Dec 12;6:159-87. doi: 10.4137/EBO.S5875.
4
Pairwise statistical significance of local sequence alignment using sequence-specific and position-specific substitution matrices.使用序列特异性和位置特异性取代矩阵进行局部序列比对的成对统计显著性。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Jan-Mar;8(1):194-205. doi: 10.1109/TCBB.2009.69.
5
ESTIMATING THE GUMBEL SCALE PARAMETER FOR LOCAL ALIGNMENT OF RANDOM SEQUENCES BY IMPORTANCE SAMPLING WITH STOPPING TIMES.通过带停止时间的重要性抽样估计随机序列局部比对的耿贝尔尺度参数。
Ann Stat. 2009 Dec 1;37(6A):3697. doi: 10.1214/08-AOS663.
6
Pairwise statistical significance and empirical determination of effective gap opening penalties for protein local sequence alignment.蛋白质局部序列比对中有效空位开放罚分的成对统计显著性和经验确定
Int J Comput Biol Drug Des. 2008;1(4):347-67. doi: 10.1504/ijcbdd.2008.022207.
7
Island method for estimating the statistical significance of profile-profile alignment scores.用于估计序列轮廓与序列轮廓比对得分统计显著性的岛方法。
BMC Bioinformatics. 2009 Apr 20;10:112. doi: 10.1186/1471-2105-10-112.
8
Significance of gapped sequence alignments.缺口序列比对的意义。
J Comput Biol. 2008 Nov;15(9):1187-94. doi: 10.1089/cmb.2008.0125.
9
A practical approach to significance assessment in alignment with gaps.一种与差距相契合的显著性评估实用方法。
J Comput Biol. 2006 Mar;13(2):429-41. doi: 10.1089/cmb.2006.13.429.
10
The ASTRAL Compendium in 2004.2004年的《星盘汇编》。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D189-92. doi: 10.1093/nar/gkh034.