• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

估计渐近参数的客观方法及其在序列比对中的应用。

Objective method for estimating asymptotic parameters, with an application to sequence alignment.

作者信息

Sheetlin Sergey, Park Yonil, Spouge John L

机构信息

National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland 20894, USA.

出版信息

Phys Rev E Stat Nonlin Soft Matter Phys. 2011 Sep;84(3 Pt 1):031914. doi: 10.1103/PhysRevE.84.031914. Epub 2011 Sep 13.

DOI:10.1103/PhysRevE.84.031914
PMID:22060410
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3233989/
Abstract

Sequence alignment is an indispensable computational tool in modern molecular biology. The model underlying biological sequence alignment is of interest to physicists because it approximates the statistical mechanics of DNA and protein annealing, while bearing an intimate relationship to models of directed polymers in random media. Recent methods for determining the statistics of random sequence alignments have reduced the computation time to less than 1 s, opening up some interesting possibilities for online computation with biological search engines. Before implementation, however, the methods required an objective technique for computing regression coefficients pertinent to an asymptotic regime. Typically, physicists estimate parameters pertinent to an asymptotic regime subjectively: They eyeball their data; estimate the asymptotic regime where the regression model holds with reasonable accuracy; and then regress data only within the estimated asymptotic regime. Our publicly available computer program ARRP replaces the subjective assessment of the asymptotic regime with an objective change-point detection method, increasing confidence in the scientific objectivity of the parameter estimates. Asymptotic regression has potential applications across most of physics.

摘要

序列比对是现代分子生物学中不可或缺的计算工具。生物序列比对背后的模型引起了物理学家的兴趣,因为它近似于DNA和蛋白质退火的统计力学,同时与随机介质中定向聚合物的模型有着密切关系。最近用于确定随机序列比对统计数据的方法已将计算时间缩短至不到1秒,为生物搜索引擎的在线计算开辟了一些有趣的可能性。然而,在实施之前,这些方法需要一种客观技术来计算与渐近区域相关的回归系数。通常,物理学家主观地估计与渐近区域相关的参数:他们观察数据;估计回归模型能以合理精度成立的渐近区域;然后仅在估计的渐近区域内对数据进行回归。我们公开可用的计算机程序ARRP用一种客观的变点检测方法取代了对渐近区域的主观评估,增强了对参数估计科学客观性的信心。渐近回归在大多数物理学领域都有潜在应用。

相似文献

1
Objective method for estimating asymptotic parameters, with an application to sequence alignment.估计渐近参数的客观方法及其在序列比对中的应用。
Phys Rev E Stat Nonlin Soft Matter Phys. 2011 Sep;84(3 Pt 1):031914. doi: 10.1103/PhysRevE.84.031914. Epub 2011 Sep 13.
2
Replica model for an unusual directed polymer in 1+1 dimensions and prediction of the extremal parameter of gapped sequence alignment statistics.1+1维中一种特殊有向聚合物的复制模型及带隙序列比对统计极值参数的预测。
Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Jun;69(6 Pt 1):061904. doi: 10.1103/PhysRevE.69.061904. Epub 2004 Jun 1.
3
Exact asymptotic results for the Bernoulli matching model of sequence alignment.
Phys Rev E Stat Nonlin Soft Matter Phys. 2005 Aug;72(2 Pt 1):020901. doi: 10.1103/PhysRevE.72.020901. Epub 2005 Aug 2.
4
Quality estimation of multiple sequence alignments by Bayesian hypothesis testing.通过贝叶斯假设检验对多序列比对进行质量评估。
Bioinformatics. 2007 Sep 15;23(18):2488-90. doi: 10.1093/bioinformatics/btm366. Epub 2007 Jul 27.
5
A comparative analysis of multiple sequence alignments for biological data.生物数据多序列比对的比较分析。
Biomed Mater Eng. 2015;26 Suppl 1:S1781-9. doi: 10.3233/BME-151479.
6
SEGID: identifying interesting segments in (multiple) sequence alignments.
Bioinformatics. 2003 Jan 22;19(2):297-8. doi: 10.1093/bioinformatics/19.2.297.
7
Multiple sequence alignment with arbitrary gap costs: computing an optimal solution using polyhedral combinatorics.具有任意空位代价的多序列比对:使用多面体组合学计算最优解。
Bioinformatics. 2002;18 Suppl 2:S4-S16. doi: 10.1093/bioinformatics/18.suppl_2.s4.
8
S-SPatt: simple statistics for patterns on Markov chains.S-SPatt:马尔可夫链上模式的简单统计量
Bioinformatics. 2005 Jul 1;21(13):3051-2. doi: 10.1093/bioinformatics/bti451. Epub 2005 Apr 19.
9
Distributed sequence alignment applications for the public computing architecture.面向公共计算架构的分布式序列比对应用程序。
IEEE Trans Nanobioscience. 2008 Mar;7(1):35-43. doi: 10.1109/TNB.2008.2000148.
10
Finite width model sequence comparison.有限宽度模型序列比较
Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Aug;70(2 Pt 1):021906. doi: 10.1103/PhysRevE.70.021906. Epub 2004 Aug 17.

引用本文的文献

1
A comprehensive estimation of country-level basic reproduction numbers R0 for COVID-19: Regime regression can automatically estimate the end of the exponential phase in epidemic data.全面估计 COVID-19 的国家层面基本繁殖数 R0:体制回归可以自动估计疫情数据中指数增长阶段的结束。
PLoS One. 2021 Jul 13;16(7):e0254145. doi: 10.1371/journal.pone.0254145. eCollection 2021.
2
Analyzing similarities in genome sequences.分析基因组序列中的相似性。
Eur Phys J E Soft Matter. 2018 Jan 19;41(1):8. doi: 10.1140/epje/i2018-11609-8.
3
Empirical null estimation using zero-inflated discrete mixture distributions and its application to protein domain data.

本文引用的文献

1
Finite-temperature local protein sequence alignment: percolation and free-energy distribution.
Phys Rev E Stat Nonlin Soft Matter Phys. 2009 Dec;80(6 Pt 1):061913. doi: 10.1103/PhysRevE.80.061913. Epub 2009 Dec 17.
2
ESTIMATING THE GUMBEL SCALE PARAMETER FOR LOCAL ALIGNMENT OF RANDOM SEQUENCES BY IMPORTANCE SAMPLING WITH STOPPING TIMES.通过带停止时间的重要性抽样估计随机序列局部比对的耿贝尔尺度参数。
Ann Stat. 2009 Dec 1;37(6A):3697. doi: 10.1214/08-AOS663.
3
Event-driven power-law relaxation in weak turbulence.弱湍流中的事件驱动幂律弛豫
使用零膨胀离散混合分布的经验零值估计及其在蛋白质结构域数据中的应用。
Biometrics. 2018 Jun;74(2):458-471. doi: 10.1111/biom.12779. Epub 2017 Sep 22.
4
New finite-size correction for local alignment score distributions.局部比对得分分布的新有限尺寸校正。
BMC Res Notes. 2012 Jun 12;5:286. doi: 10.1186/1756-0500-5-286.
Phys Rev Lett. 2009 Jan 9;102(1):014502. doi: 10.1103/PhysRevLett.102.014502. Epub 2009 Jan 5.
4
Probabilistic sequence alignments: realistic models with efficient algorithms.概率序列比对:具有高效算法的现实模型。
Phys Rev Lett. 2007 Feb 16;98(7):078101. doi: 10.1103/PhysRevLett.98.078101. Epub 2007 Feb 12.
5
Score statistics of global sequence alignment from the energy distribution of a modified directed polymer and directed percolation problem.基于修正有向聚合物的能量分布和有向渗流问题的全局序列比对得分统计。
Phys Rev E Stat Nonlin Soft Matter Phys. 2005 Dec;72(6 Pt 1):061917. doi: 10.1103/PhysRevE.72.061917. Epub 2005 Dec 23.
6
The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment.用于间隙局部比对的耿贝尔前置因子k可通过全局比对模拟来估计。
Nucleic Acids Res. 2005 Sep 6;33(15):4987-94. doi: 10.1093/nar/gki800. Print 2005.
7
Finite width model sequence comparison.有限宽度模型序列比较
Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Aug;70(2 Pt 1):021906. doi: 10.1103/PhysRevE.70.021906. Epub 2004 Aug 17.
8
Replica model for an unusual directed polymer in 1+1 dimensions and prediction of the extremal parameter of gapped sequence alignment statistics.1+1维中一种特殊有向聚合物的复制模型及带隙序列比对统计极值参数的预测。
Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Jun;69(6 Pt 1):061904. doi: 10.1103/PhysRevE.69.061904. Epub 2004 Jun 1.
9
Application of a mathematical model to prevent in vivo amplification of antibiotic-resistant bacterial populations during therapy.应用数学模型预防治疗期间抗生素耐药细菌群体的体内扩增。
J Clin Invest. 2003 Jul;112(2):275-85. doi: 10.1172/JCI16814.
10
The correlation error and finite-size correction in an ungapped sequence alignment.无间隙序列比对中的相关误差和有限尺寸校正。
Bioinformatics. 2002 Sep;18(9):1236-42. doi: 10.1093/bioinformatics/18.9.1236.