Wu Jianhua, Kendrick Keith M, Feng Jianfeng
Department of Computer Science, Warwick University, Coventry CV4 7AL, UK.
BMC Bioinformatics. 2007 Sep 11;8:331. doi: 10.1186/1471-2105-8-331.
Progressive advances in the measurement of complex multifactorial components of biological processes involving both spatial and temporal domains have made it difficult to identify the variables (genes, proteins, neurons etc.) significantly changed activities in response to a stimulus within large data sets using conventional statistical approaches. The set of all changed variables is termed hot-spots. The detection of such hot spots is considered to be an NP hard problem, but by first establishing its theoretical foundation we have been able to develop an algorithm that provides a solution.
Our results show that a first-order phase transition is observable whose critical point separates the hot-spot set from the remaining variables. Its application is also found to be more successful than existing approaches in identifying statistically significant hot-spots both with simulated data sets and in real large-scale multivariate data sets from gene arrays, electrophysiological recording and functional magnetic resonance imaging experiments.
In summary, this new statistical algorithm should provide a powerful new analytical tool to extract the maximum information from complex biological multivariate data.
在涉及空间和时间域的生物过程复杂多因素成分测量方面的逐步进展,使得使用传统统计方法在大数据集中识别响应刺激而活动发生显著变化的变量(基因、蛋白质、神经元等)变得困难。所有变化变量的集合称为热点。检测此类热点被认为是一个NP难题,但通过首先建立其理论基础,我们能够开发出一种提供解决方案的算法。
我们的结果表明,可以观察到一阶相变,其临界点将热点集与其余变量分开。还发现其应用在识别模拟数据集以及来自基因阵列、电生理记录和功能磁共振成像实验的实际大规模多变量数据集中具有统计学意义的热点方面比现有方法更成功。
总之,这种新的统计算法应该提供一个强大的新分析工具,以从复杂的生物多变量数据中提取最大信息。