Grossfield Alan, Zuckerman Daniel M
University of Rochester Medical Center, Department of Biochemistry and Biophysics, Box 712, Rochester, N.Y., 14642, USA, 585-276-4193.
Annu Rep Comput Chem. 2009 Jan 1;5:23-48. doi: 10.1016/S1574-1400(09)00502-7.
Growing computing capacity and algorithmic advances have facilitated the study of increasingly large biomolecular systems at longer timescales. However, with these larger, more complex systems come questions about the quality of sampling and statistical convergence. What size systems can be sampled fully? If a system is not fully sampled, can certain "fast variables" be considered well-converged? How can one determine the statistical significance of observed results? The present review describes statistical tools and the underlying physical ideas necessary to address these questions. Basic definitions and ready-to-use analyses are provided, along with explicit recommendations. Such statistical analyses are of paramount importance in establishing the reliability of simulation data in any given study.
计算能力的不断提升和算法的进步,使得在更长的时间尺度上对越来越大的生物分子系统进行研究变得更加容易。然而,随着这些更大、更复杂的系统的出现,关于采样质量和统计收敛性的问题也随之而来。多大规模的系统能够被完全采样?如果一个系统没有被完全采样,某些“快速变量”能否被认为是充分收敛的?如何确定观测结果的统计显著性?本综述描述了用于解决这些问题所需的统计工具和基本物理概念。文中提供了基本定义和现成可用的分析方法,并给出了明确的建议。在任何特定研究中,此类统计分析对于确立模拟数据的可靠性至关重要。