Geurts Pierre, Irrthum Alexandre, Wehenkel Louis
Department of EE and CS & GIGA-Research, University of Liège, Belgium.
Mol Biosyst. 2009 Dec;5(12):1593-605. doi: 10.1039/b907946g. Epub 2009 Oct 5.
At the intersection between artificial intelligence and statistics, supervised learning allows algorithms to automatically build predictive models from just observations of a system. During the last twenty years, supervised learning has been a tool of choice to analyze the always increasing and complexifying data generated in the context of molecular biology, with successful applications in genome annotation, function prediction, or biomarker discovery. Among supervised learning methods, decision tree-based methods stand out as non parametric methods that have the unique feature of combining interpretability, efficiency, and, when used in ensembles of trees, excellent accuracy. The goal of this paper is to provide an accessible and comprehensive introduction to this class of methods. The first part of the review is devoted to an intuitive but complete description of decision tree-based methods and a discussion of their strengths and limitations with respect to other supervised learning methods. The second part of the review provides a survey of their applications in the context of computational and systems biology.
在人工智能与统计学的交叉领域,监督学习使算法能够仅根据对一个系统的观测自动构建预测模型。在过去二十年中,监督学习一直是分析分子生物学背景下不断增加且日益复杂的数据的首选工具,在基因组注释、功能预测或生物标志物发现等方面都有成功应用。在监督学习方法中,基于决策树的方法作为非参数方法脱颖而出,具有将可解释性、效率以及在树的集成中使用时的出色准确性相结合的独特特性。本文的目的是对这类方法进行通俗易懂且全面的介绍。综述的第一部分致力于对基于决策树的方法进行直观而完整的描述,并讨论它们相对于其他监督学习方法的优缺点。综述的第二部分对它们在计算生物学和系统生物学背景下的应用进行了概述。