Battogtokh Bilguunzaya, Mojirsheibani Majid, Malley James
Center for Information Technology, National Institutes of Health, Bethesda, MD USA.
Department of Mathematics, California State University Northridge, Northridge, CA USA.
BioData Min. 2017 May 19;10:16. doi: 10.1186/s13040-017-0135-7. eCollection 2017.
Any family of learning machines can be combined into a single learning machine using various methods with myriad degrees of usefulness.
For making predictions on an outcome, it is provably at least as good as the best machine in the family, given sufficient data. And if one machine in the family minimizes the probability of misclassification, in the limit of large data, then Optimal Crowd does also. That is, the Optimal Crowd is asymptotically Bayes optimal if any machine in the crowd is such.
The only assumption needed for proving optimality is that the outcome variable is bounded. The scheme is illustrated using real-world data from the UCI machine learning site, and possible extensions are proposed.
任何学习机器家族都可以通过各种方法组合成一个单一的学习机器,这些方法具有不同程度的实用性。
在有足够数据的情况下,对于对结果进行预测而言,可证明它至少与家族中最佳的机器一样好。并且如果家族中的一台机器将错误分类的概率最小化,在大数据的极限情况下,那么最优群体(Optimal Crowd)也会如此。也就是说,如果群体中的任何一台机器是渐近贝叶斯最优的,那么最优群体也是。
证明最优性所需的唯一假设是结果变量是有界的。使用来自UCI机器学习网站的真实世界数据对该方案进行了说明,并提出了可能的扩展。