Sheetlin Sergey, Park Yonil, Spouge John L
National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland 20894, USA.
Phys Rev E Stat Nonlin Soft Matter Phys. 2011 Sep;84(3 Pt 1):031914. doi: 10.1103/PhysRevE.84.031914. Epub 2011 Sep 13.
Sequence alignment is an indispensable computational tool in modern molecular biology. The model underlying biological sequence alignment is of interest to physicists because it approximates the statistical mechanics of DNA and protein annealing, while bearing an intimate relationship to models of directed polymers in random media. Recent methods for determining the statistics of random sequence alignments have reduced the computation time to less than 1 s, opening up some interesting possibilities for online computation with biological search engines. Before implementation, however, the methods required an objective technique for computing regression coefficients pertinent to an asymptotic regime. Typically, physicists estimate parameters pertinent to an asymptotic regime subjectively: They eyeball their data; estimate the asymptotic regime where the regression model holds with reasonable accuracy; and then regress data only within the estimated asymptotic regime. Our publicly available computer program ARRP replaces the subjective assessment of the asymptotic regime with an objective change-point detection method, increasing confidence in the scientific objectivity of the parameter estimates. Asymptotic regression has potential applications across most of physics.
序列比对是现代分子生物学中不可或缺的计算工具。生物序列比对背后的模型引起了物理学家的兴趣,因为它近似于DNA和蛋白质退火的统计力学,同时与随机介质中定向聚合物的模型有着密切关系。最近用于确定随机序列比对统计数据的方法已将计算时间缩短至不到1秒,为生物搜索引擎的在线计算开辟了一些有趣的可能性。然而,在实施之前,这些方法需要一种客观技术来计算与渐近区域相关的回归系数。通常,物理学家主观地估计与渐近区域相关的参数:他们观察数据;估计回归模型能以合理精度成立的渐近区域;然后仅在估计的渐近区域内对数据进行回归。我们公开可用的计算机程序ARRP用一种客观的变点检测方法取代了对渐近区域的主观评估,增强了对参数估计科学客观性的信心。渐近回归在大多数物理学领域都有潜在应用。