König I R, Malley J D, Pajevic S, Weimar C, Diener H-C, Ziegler A
Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Ratzeburger Allee 160, 23538 Lübeck, Germany.
Int J Data Min Bioinform. 2008;2(4):289-341. doi: 10.1504/ijdmb.2008.022149.
In the last 15 years several machine learning approaches have been developed for classification and regression. In an intuitive manner we introduce the main ideas of classification and regression trees, support vector machines, bagging, boosting and random forests. We discuss differences in the use of machine learning in the biomedical community and the computer sciences. We propose methods for comparing machines on a sound statistical basis. Data from the German Stroke Study Collaboration is used for illustration. We compare the results from learning machines to those obtained by a published logistic regression and discuss similarities and differences.
在过去15年里,已经开发出了几种用于分类和回归的机器学习方法。我们以直观的方式介绍分类与回归树、支持向量机、装袋法、提升法和随机森林的主要思想。我们讨论了生物医学领域和计算机科学在机器学习应用方面的差异。我们提出了在合理的统计基础上比较机器的方法。以德国中风研究合作项目的数据为例进行说明。我们将机器学习的结果与已发表的逻辑回归结果进行比较,并讨论异同。