Inza Iñaki, Calvo Borja, Armañanzas Rubén, Bengoetxea Endika, Larrañaga Pedro, Lozano José A
Intelligent Systems Group, Donostia - San Sebastián, Basque Country, Spain.
Methods Mol Biol. 2010;593:25-48. doi: 10.1007/978-1-60327-194-3_2.
The increase in the number and complexity of biological databases has raised the need for modern and powerful data analysis tools and techniques. In order to fulfill these requirements, the machine learning discipline has become an everyday tool in bio-laboratories. The use of machine learning techniques has been extended to a wide spectrum of bioinformatics applications. It is broadly used to investigate the underlying mechanisms and interactions between biological molecules in many diseases, and it is an essential tool in any biomarker discovery process. In this chapter, we provide a basic taxonomy of machine learning algorithms, and the characteristics of main data preprocessing, supervised classification, and clustering techniques are shown. Feature selection, classifier evaluation, and two supervised classification topics that have a deep impact on current bioinformatics are presented. We make the interested reader aware of a set of popular web resources, open source software tools, and benchmarking data repositories that are frequently used by the machine learning community.
生物数据库数量的增加及其复杂性的提升,使得对现代且强大的数据分析工具和技术的需求不断增长。为了满足这些需求,机器学习学科已成为生物实验室中的日常工具。机器学习技术的应用已扩展到广泛的生物信息学应用领域。它被广泛用于研究许多疾病中生物分子之间的潜在机制和相互作用,并且是任何生物标志物发现过程中的重要工具。在本章中,我们提供了机器学习算法的基本分类,并展示了主要数据预处理、监督分类和聚类技术的特点。介绍了特征选择、分类器评估以及对当前生物信息学有深远影响的两个监督分类主题。我们让感兴趣的读者了解机器学习社区经常使用的一组流行网络资源、开源软件工具和基准测试数据存储库。