Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20740, USA
Laboratoire de Physiologie Cellulaire et Végétale, iRTSV, CEA-Grenoble, 17 rue des Martyrs, 38054 Grenoble Cedex 9, France
Nucleic Acids Res. 2014 Jul;42(Web Server issue):W198-204. doi: 10.1093/nar/gku427. Epub 2014 May 30.
Pairwise comparison of data vectors represents a large part of computational biology, especially with the continuous increase in genome-wide approaches yielding more information from more biological samples simultaneously. Gene clustering for function prediction as well as analyses of signalling pathways and the time-dependent dynamics of a system are common biological approaches that often rely on large dataset comparison. Different metrics can be used to evaluate the similarity between entities to be compared, such as correlation coefficients and distances. While the latter offers a more flexible way of measuring potential biological relationships between datasets, the significance of any given distance is highly dependent on the dataset and cannot be easily determined. Monte Carlo methods are robust approaches for evaluating the significance of distance values by multiple random permutations of the dataset followed by distance calculation. We have developed R. S. WebTool (http://rswebtool.kwaklab.org), a user-friendly online server for random sampling-based evaluation of distance significances that features an array of visualization and analysis tools to help non-bioinformaticist users extract significant relationships from random noise in distance-based dataset analyses.
数据向量的两两比较是计算生物学的重要组成部分,尤其是随着全基因组方法的不断发展,同时从更多的生物样本中获得了更多的信息。基因聚类用于功能预测,以及信号通路和系统的时变动态分析是常见的生物学方法,这些方法通常依赖于大型数据集比较。可以使用不同的度量标准来评估要比较的实体之间的相似性,例如相关系数和距离。虽然后者提供了一种更灵活的方法来测量数据集之间潜在的生物学关系,但任何给定距离的显著性高度依赖于数据集,并且不容易确定。蒙特卡罗方法是一种强大的方法,可以通过对数据集进行多次随机排列,然后计算距离,来评估距离值的显著性。我们开发了 R. S. WebTool(http://rswebtool.kwaklab.org),这是一个用户友好的在线服务器,用于基于随机抽样的距离显著性评估,具有一系列可视化和分析工具,帮助非生物信息学家用户从基于距离的数据集分析中的随机噪声中提取显著关系。