Wang Yi, Li Yi, Pu Weilin, Wen Kathryn, Shugart Yin Yao, Xiong Momiao, Jin Li
Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai 200433, China.
Unit on Statistical Genomics, Division of Intramural Division Programs, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA.
Sci Rep. 2016 Jul 22;6:30086. doi: 10.1038/srep30086.
Efficiency, memory consumption, and robustness are common problems with many popular methods for data analysis. As a solution, we present Random Bits Forest (RBF), a classification and regression algorithm that integrates neural networks (for depth), boosting (for width), and random forests (for prediction accuracy). Through a gradient boosting scheme, it first generates and selects ~10,000 small, 3-layer random neural networks. These networks are then fed into a modified random forest algorithm to obtain predictions. Testing with datasets from the UCI (University of California, Irvine) Machine Learning Repository shows that RBF outperforms other popular methods in both accuracy and robustness, especially with large datasets (N > 1000). The algorithm also performed highly in testing with an independent data set, a real psoriasis genome-wide association study (GWAS).