Bressler Ryan, Kreisberg Richard B, Bernard Brady, Niederhuber John E, Vockley Joseph G, Shmulevich Ilya, Knijnenburg Theo A
Institute for Systems Biology, Seattle, WA, United States of America.
Inova Translational Medicine Institute, Inova Health System and Inova Fairfax Medical Center, Falls Church, VA, United States of America.
PLoS One. 2015 Dec 17;10(12):e0144820. doi: 10.1371/journal.pone.0144820. eCollection 2015.
Random Forest has become a standard data analysis tool in computational biology. However, extensions to existing implementations are often necessary to handle the complexity of biological datasets and their associated research questions. The growing size of these datasets requires high performance implementations. We describe CloudForest, a Random Forest package written in Go, which is particularly well suited for large, heterogeneous, genetic and biomedical datasets. CloudForest includes several extensions, such as dealing with unbalanced classes and missing values. Its flexible design enables users to easily implement additional extensions. CloudForest achieves fast running times by effective use of the CPU cache, optimizing for different classes of features and efficiently multi-threading. https://github.com/ilyalab/CloudForest.
随机森林已成为计算生物学中的标准数据分析工具。然而,为了处理生物数据集的复杂性及其相关研究问题,通常需要对现有实现进行扩展。这些数据集规模的不断扩大需要高性能的实现。我们描述了CloudForest,一个用Go编写的随机森林包,它特别适用于大型、异构的遗传和生物医学数据集。CloudForest包括几个扩展,例如处理不平衡类和缺失值。其灵活的设计使用户能够轻松实现额外的扩展。CloudForest通过有效利用CPU缓存、针对不同类别的特征进行优化以及高效的多线程处理来实现快速运行时间。https://github.com/ilyalab/CloudForest 。