Suppr超能文献

云森林:一种用于生物数据的可扩展且高效的随机森林实现方法。

CloudForest: A Scalable and Efficient Random Forest Implementation for Biological Data.

作者信息

Bressler Ryan, Kreisberg Richard B, Bernard Brady, Niederhuber John E, Vockley Joseph G, Shmulevich Ilya, Knijnenburg Theo A

机构信息

Institute for Systems Biology, Seattle, WA, United States of America.

Inova Translational Medicine Institute, Inova Health System and Inova Fairfax Medical Center, Falls Church, VA, United States of America.

出版信息

PLoS One. 2015 Dec 17;10(12):e0144820. doi: 10.1371/journal.pone.0144820. eCollection 2015.

Abstract

Random Forest has become a standard data analysis tool in computational biology. However, extensions to existing implementations are often necessary to handle the complexity of biological datasets and their associated research questions. The growing size of these datasets requires high performance implementations. We describe CloudForest, a Random Forest package written in Go, which is particularly well suited for large, heterogeneous, genetic and biomedical datasets. CloudForest includes several extensions, such as dealing with unbalanced classes and missing values. Its flexible design enables users to easily implement additional extensions. CloudForest achieves fast running times by effective use of the CPU cache, optimizing for different classes of features and efficiently multi-threading. https://github.com/ilyalab/CloudForest.

摘要

随机森林已成为计算生物学中的标准数据分析工具。然而,为了处理生物数据集的复杂性及其相关研究问题,通常需要对现有实现进行扩展。这些数据集规模的不断扩大需要高性能的实现。我们描述了CloudForest,一个用Go编写的随机森林包,它特别适用于大型、异构的遗传和生物医学数据集。CloudForest包括几个扩展,例如处理不平衡类和缺失值。其灵活的设计使用户能够轻松实现额外的扩展。CloudForest通过有效利用CPU缓存、针对不同类别的特征进行优化以及高效的多线程处理来实现快速运行时间。https://github.com/ilyalab/CloudForest

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bf9/4692062/f59b95f64b47/pone.0144820.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验