Suppr超能文献

使用坐标下降优化的结构化特征选择

Structured feature selection using coordinate descent optimization.

作者信息

Ghalwash Mohamed F, Cao Xi Hang, Stojkovic Ivan, Obradovic Zoran

机构信息

Center for Data Analytics and Biomedical Informatics, College of Science and Technology, Temple University, North 12th Street, Philadelphia, 19122, PA, USA.

Mathematics Department, Faculty of Science, Ain Shams University, Cairo, 11331, Egypt.

出版信息

BMC Bioinformatics. 2016 Apr 8;17:158. doi: 10.1186/s12859-016-0954-4.

Abstract

BACKGROUND

Existing feature selection methods typically do not consider prior knowledge in the form of structural relationships among features. In this study, the features are structured based on prior knowledge into groups. The problem addressed in this article is how to select one representative feature from each group such that the selected features are jointly discriminating the classes. The problem is formulated as a binary constrained optimization and the combinatorial optimization is relaxed as a convex-concave problem, which is then transformed into a sequence of convex optimization problems so that the problem can be solved by any standard optimization algorithm. Moreover, a block coordinate gradient descent optimization algorithm is proposed for high dimensional feature selection, which in our experiments was four times faster than using a standard optimization algorithm.

RESULTS

In order to test the effectiveness of the proposed formulation, we used microarray analysis as a case study, where genes with similar expressions or similar molecular functions were grouped together. In particular, the proposed block coordinate gradient descent feature selection method is evaluated on five benchmark microarray gene expression datasets and evidence is provided that the proposed method gives more accurate results than the state-of-the-art gene selection methods. Out of 25 experiments, the proposed method achieved the highest average AUC in 13 experiments while the other methods achieved higher average AUC in no more than 6 experiments.

CONCLUSION

A method is developed to select a feature from each group. When the features are grouped based on similarity in gene expression, we showed that the proposed algorithm is more accurate than state-of-the-art gene selection methods that are particularly developed to select highly discriminative and less redundant genes. In addition, the proposed method can exploit any grouping structure among features, while alternative methods are restricted to using similarity based grouping.

摘要

背景

现有的特征选择方法通常不考虑特征之间结构关系形式的先验知识。在本研究中,基于先验知识将特征构建成组。本文所解决的问题是如何从每组中选择一个代表性特征,以使所选特征能够共同区分不同类别。该问题被表述为一个二元约束优化问题,并且组合优化被松弛为一个凸凹问题,然后将其转化为一系列凸优化问题,以便可以通过任何标准优化算法来求解该问题。此外,还提出了一种用于高维特征选择的块坐标梯度下降优化算法,在我们的实验中,该算法比使用标准优化算法快四倍。

结果

为了测试所提出公式的有效性,我们以微阵列分析为例进行研究,将具有相似表达或相似分子功能的基因归为一组。具体而言,在所提出的块坐标梯度下降特征选择方法在五个基准微阵列基因表达数据集上进行了评估,结果表明该方法比现有最先进的基因选择方法给出了更准确的结果。在25次实验中,所提出的方法在13次实验中获得了最高的平均AUC,而其他方法在不超过6次实验中获得了更高的平均AUC。

结论

开发了一种从每组中选择一个特征的方法。当基于基因表达的相似性对特征进行分组时,我们表明所提出的算法比专门为选择高区分性和低冗余基因而开发的现有最先进的基因选择方法更准确。此外,所提出的方法可以利用特征之间的任何分组结构,而其他方法仅限于使用基于相似性的分组。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cb8/4826549/ed57eb857c37/12859_2016_954_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验