Alonso-Betanzos Amparo, Bolón-Canedo Verónica, Morán-Fernández Laura, Seijo-Pardo Borja
CITIC, Universidade da Coruña, A Coruña, Spain.
Methods Mol Biol. 2019;1986:123-152. doi: 10.1007/978-1-4939-9442-7_6.
A typical characteristic of microarray data is that it has a very high number of features (in the order of thousands) while the number of examples is usually less than 100. In the context of microarray classification, this poses a challenge for machine learning methods, which can suffer overfitting and thus degradation in their performance. A common solution is to apply a dimensionality reduction technique before classification, to reduce the number of features. This chapter will be focused on one of the most famous dimensionality reduction techniques: feature selection. We will see how feature selection can help improve the classification accuracy in several microarray data scenarios.
微阵列数据的一个典型特征是其具有非常多的特征(数量达数千个),而样本数量通常少于100个。在微阵列分类的背景下,这给机器学习方法带来了挑战,机器学习方法可能会出现过拟合,进而导致性能下降。一种常见的解决方案是在分类之前应用降维技术,以减少特征数量。本章将重点介绍最著名的降维技术之一:特征选择。我们将看到特征选择如何在几种微阵列数据场景中帮助提高分类准确率。