Shahrjooihaghighi Aliasghar, Frigui Hichem, Zhang Xiang, Wei Xiaoli, Shi Biyun, Trabelsi Ameni
Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY 40292, USA.
Department of Chemistry, University of Louisville, Louisville, KY 40292, USA.
Proc IEEE Int Symp Signal Proc Inf Tech. 2017 Dec;2017:416-421. doi: 10.1109/ISSPIT.2017.8388679. Epub 2018 Jun 21.
Feature selection in Liquid Chromatography-Mass Spectrometry (LC-MS)-based metabolomics data (biomarker discovery) have become an important topic for machine learning researchers. High dimensionality and small sample size of LC-MS data make feature selection a challenging task. The goal of biomarker discovery is to select the few most discriminative features among a large number of irreverent ones. To improve the reliability of the discovered biomarkers, we use an ensemble-based approach. Ensemble learning can improve the accuracy of feature selection by combining multiple algorithms that have complementary information. In this paper, we propose an ensemble approach to combine the results of filter-based feature selection methods. To evaluate the proposed approach, we compared it to two commonly used methods, t-test and PLS-DA, using a real data set.
基于液相色谱-质谱联用(LC-MS)的代谢组学数据中的特征选择(生物标志物发现)已成为机器学习研究人员的一个重要课题。LC-MS数据的高维度和小样本量使得特征选择成为一项具有挑战性的任务。生物标志物发现的目标是在大量无关特征中选择少数最具判别力的特征。为了提高所发现生物标志物的可靠性,我们使用一种基于集成的方法。集成学习可以通过组合具有互补信息的多种算法来提高特征选择的准确性。在本文中,我们提出了一种集成方法来组合基于过滤的特征选择方法的结果。为了评估所提出的方法,我们使用一个真实数据集将其与两种常用方法t检验和偏最小二乘判别分析(PLS-DA)进行了比较。