Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire; Institute for Quantitative Biomedical Sciences, Geisel School of Medicine at Dartmouth, Lebanon, New Hampshire; Norris Cotton Cancer Center, Geisel School of Medicine at Dartmouth, Lebanon, New Hampshire.
J Cell Physiol. 2014 Dec;229(12):1896-900. doi: 10.1002/jcp.24662.
Recent technological advances allow for high throughput profiling of biological systems in a cost-efficient manner. The low cost of data generation is leading us to the "big data" era. The availability of big data provides unprecedented opportunities but also raises new challenges for data mining and analysis. In this review, we introduce key concepts in the analysis of big data, including both "machine learning" algorithms as well as "unsupervised" and "supervised" examples of each. We note packages for the R programming language that are available to perform machine learning analyses. In addition to programming based solutions, we review webservers that allow users with limited or no programming background to perform these analyses on large data compendia.
最近的技术进步使得以经济高效的方式对生物系统进行高通量分析成为可能。数据生成成本的降低使我们进入了“大数据”时代。大数据的可用性为数据挖掘和分析提供了前所未有的机遇,但也带来了新的挑战。在这篇综述中,我们介绍了大数据分析中的关键概念,包括“机器学习”算法以及每种算法的“无监督”和“监督”示例。我们注意到了可用于执行机器学习分析的 R 编程语言包。除了基于编程的解决方案外,我们还审查了允许具有有限或没有编程背景的用户对大型数据汇编执行这些分析的网络服务器。