Institute of Biomedical Technology, University of Tampere, Tampere, Finland.
Hum Mutat. 2013 Jan;34(1):42-9. doi: 10.1002/humu.22204. Epub 2012 Oct 11.
Several computational methods have been developed for predicting the effects of rapidly expanding variation data. Comparison of the performance of tools has been very difficult as the methods have been trained and tested with different datasets. Until now, unbiased and representative benchmark datasets have been missing. We have developed a benchmark database suite, VariBench, to overcome this problem. VariBench contains datasets of experimentally verified high-quality variation data carefully chosen from literature and relevant databases. It provides the mapping of variation position to different levels (protein, RNA and DNA sequences, protein three-dimensional structure), along with identifier mapping to relevant databases. VariBench contains the first benchmark datasets for variation effect analysis, a field which is of high importance and where many developments are currently going on. VariBench datasets can be used, for example, to test performance of prediction tools as well as to train novel machine learning-based tools. New datasets will be included and the community is encouraged to submit high-quality datasets to the service. VariBench is freely available at http://structure.bmc.lu.se/VariBench.
已经开发了几种计算方法来预测快速扩展的变异数据的影响。由于这些方法是使用不同的数据集进行训练和测试的,因此很难比较工具的性能。到目前为止,还缺少无偏且具有代表性的基准数据集。我们已经开发了一个基准数据库套件 VariBench,以克服这个问题。VariBench 包含了从文献和相关数据库中精心挑选的实验验证的高质量变异数据的数据集。它提供了变异位置到不同层次(蛋白质、RNA 和 DNA 序列、蛋白质三维结构)的映射,以及到相关数据库的标识符映射。VariBench 包含了变异效应分析的第一个基准数据集,这个领域非常重要,目前正在进行许多开发。VariBench 数据集可用于测试预测工具的性能,也可用于训练新的基于机器学习的工具。将包含新的数据集,并鼓励社区向该服务提交高质量的数据集。VariBench 可在 http://structure.bmc.lu.se/VariBench 上免费获得。