• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用坐标下降优化的结构化特征选择

Structured feature selection using coordinate descent optimization.

作者信息

Ghalwash Mohamed F, Cao Xi Hang, Stojkovic Ivan, Obradovic Zoran

机构信息

Center for Data Analytics and Biomedical Informatics, College of Science and Technology, Temple University, North 12th Street, Philadelphia, 19122, PA, USA.

Mathematics Department, Faculty of Science, Ain Shams University, Cairo, 11331, Egypt.

出版信息

BMC Bioinformatics. 2016 Apr 8;17:158. doi: 10.1186/s12859-016-0954-4.

DOI:10.1186/s12859-016-0954-4
PMID:27059502
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4826549/
Abstract

BACKGROUND

Existing feature selection methods typically do not consider prior knowledge in the form of structural relationships among features. In this study, the features are structured based on prior knowledge into groups. The problem addressed in this article is how to select one representative feature from each group such that the selected features are jointly discriminating the classes. The problem is formulated as a binary constrained optimization and the combinatorial optimization is relaxed as a convex-concave problem, which is then transformed into a sequence of convex optimization problems so that the problem can be solved by any standard optimization algorithm. Moreover, a block coordinate gradient descent optimization algorithm is proposed for high dimensional feature selection, which in our experiments was four times faster than using a standard optimization algorithm.

RESULTS

In order to test the effectiveness of the proposed formulation, we used microarray analysis as a case study, where genes with similar expressions or similar molecular functions were grouped together. In particular, the proposed block coordinate gradient descent feature selection method is evaluated on five benchmark microarray gene expression datasets and evidence is provided that the proposed method gives more accurate results than the state-of-the-art gene selection methods. Out of 25 experiments, the proposed method achieved the highest average AUC in 13 experiments while the other methods achieved higher average AUC in no more than 6 experiments.

CONCLUSION

A method is developed to select a feature from each group. When the features are grouped based on similarity in gene expression, we showed that the proposed algorithm is more accurate than state-of-the-art gene selection methods that are particularly developed to select highly discriminative and less redundant genes. In addition, the proposed method can exploit any grouping structure among features, while alternative methods are restricted to using similarity based grouping.

摘要

背景

现有的特征选择方法通常不考虑特征之间结构关系形式的先验知识。在本研究中,基于先验知识将特征构建成组。本文所解决的问题是如何从每组中选择一个代表性特征,以使所选特征能够共同区分不同类别。该问题被表述为一个二元约束优化问题,并且组合优化被松弛为一个凸凹问题,然后将其转化为一系列凸优化问题,以便可以通过任何标准优化算法来求解该问题。此外,还提出了一种用于高维特征选择的块坐标梯度下降优化算法,在我们的实验中,该算法比使用标准优化算法快四倍。

结果

为了测试所提出公式的有效性,我们以微阵列分析为例进行研究,将具有相似表达或相似分子功能的基因归为一组。具体而言,在所提出的块坐标梯度下降特征选择方法在五个基准微阵列基因表达数据集上进行了评估,结果表明该方法比现有最先进的基因选择方法给出了更准确的结果。在25次实验中,所提出的方法在13次实验中获得了最高的平均AUC,而其他方法在不超过6次实验中获得了更高的平均AUC。

结论

开发了一种从每组中选择一个特征的方法。当基于基因表达的相似性对特征进行分组时,我们表明所提出的算法比专门为选择高区分性和低冗余基因而开发的现有最先进的基因选择方法更准确。此外,所提出的方法可以利用特征之间的任何分组结构,而其他方法仅限于使用基于相似性的分组。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cb8/4826549/107764a9528c/12859_2016_954_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cb8/4826549/ed57eb857c37/12859_2016_954_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cb8/4826549/814d9172fbae/12859_2016_954_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cb8/4826549/130cac35c10a/12859_2016_954_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cb8/4826549/af53f3ee6f99/12859_2016_954_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cb8/4826549/9534212d71b3/12859_2016_954_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cb8/4826549/f27e910e2358/12859_2016_954_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cb8/4826549/167c260c2f24/12859_2016_954_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cb8/4826549/1fc5771947af/12859_2016_954_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cb8/4826549/107764a9528c/12859_2016_954_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cb8/4826549/ed57eb857c37/12859_2016_954_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cb8/4826549/814d9172fbae/12859_2016_954_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cb8/4826549/130cac35c10a/12859_2016_954_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cb8/4826549/af53f3ee6f99/12859_2016_954_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cb8/4826549/9534212d71b3/12859_2016_954_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cb8/4826549/f27e910e2358/12859_2016_954_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cb8/4826549/167c260c2f24/12859_2016_954_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cb8/4826549/1fc5771947af/12859_2016_954_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cb8/4826549/107764a9528c/12859_2016_954_Fig9_HTML.jpg

相似文献

1
Structured feature selection using coordinate descent optimization.使用坐标下降优化的结构化特征选择
BMC Bioinformatics. 2016 Apr 8;17:158. doi: 10.1186/s12859-016-0954-4.
2
An improved binary particle swarm optimization algorithm for clinical cancer biomarker identification in microarray data.一种用于微阵列数据中临床癌症生物标志物识别的改进二元粒子群优化算法。
Comput Methods Programs Biomed. 2024 Feb;244:107987. doi: 10.1016/j.cmpb.2023.107987. Epub 2023 Dec 21.
3
Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification.遗传蜂群(GBC)算法:一种用于微阵列癌症分类的新基因选择方法。
Comput Biol Chem. 2015 Jun;56:49-60. doi: 10.1016/j.compbiolchem.2015.03.001. Epub 2015 Mar 18.
4
Unsupervised Feature Selection via Nonnegative Spectral Analysis and Redundancy Control.非负谱分析和冗余控制的无监督特征选择。
IEEE Trans Image Process. 2015 Dec;24(12):5343-55. doi: 10.1109/TIP.2015.2479560. Epub 2015 Sep 17.
5
Discrete Biogeography Based Optimization for Feature Selection in Molecular Signatures.基于离散生物地理学的分子特征选择优化方法
Mol Inform. 2015 Apr;34(4):197-215. doi: 10.1002/minf.201400065. Epub 2015 Mar 20.
6
CCFS: A cooperating coevolution technique for large scale feature selection on microarray datasets.CCFS:一种用于微阵列数据集大规模特征选择的协同协同进化技术。
Comput Biol Chem. 2018 Apr;73:171-178. doi: 10.1016/j.compbiolchem.2018.02.006. Epub 2018 Feb 17.
7
FSMRank: feature selection algorithm for learning to rank.FSMRank:用于学习排序的特征选择算法。
IEEE Trans Neural Netw Learn Syst. 2013 Jun;24(6):940-52. doi: 10.1109/TNNLS.2013.2247628.
8
An Innovative Excited-ACS-IDGWO Algorithm for Optimal Biomedical Data Feature Selection.一种创新的基于激发 ACS-IDGWO 算法的最优生物医学数据特征选择方法。
Biomed Res Int. 2020 Aug 17;2020:8506365. doi: 10.1155/2020/8506365. eCollection 2020.
9
The feature selection bias problem in relation to high-dimensional gene data.与高维基因数据相关的特征选择偏差问题。
Artif Intell Med. 2016 Jan;66:63-71. doi: 10.1016/j.artmed.2015.11.001. Epub 2015 Nov 14.
10
Deep gene selection method to select genes from microarray datasets for cancer classification.深度基因选择方法,从微阵列数据集选择基因用于癌症分类。
BMC Bioinformatics. 2019 Nov 27;20(1):608. doi: 10.1186/s12859-019-3161-2.

引用本文的文献

1
HARVESTMAN: a framework for hierarchical feature learning and selection from whole genome sequencing data.HARVESTMAN:一种从全基因组测序数据中进行层次特征学习和选择的框架。
BMC Bioinformatics. 2021 Apr 1;22(1):174. doi: 10.1186/s12859-021-04096-6.
2
Curated Model Development Using NEUROiD: A Web-Based NEUROmotor Integration and Design Platform.使用NEUROiD进行定制模型开发:一个基于网络的神经运动整合与设计平台。
Front Neuroinform. 2019 Aug 7;13:56. doi: 10.3389/fninf.2019.00056. eCollection 2019.
3
Feature selection for high-dimensional temporal data.

本文引用的文献

1
Multi-task feature selection in microarray data by binary integer programming.通过二进制整数规划进行微阵列数据中的多任务特征选择
BMC Proc. 2013 Dec 20;7(Suppl 7):S5. doi: 10.1186/1753-6561-7-S7-S5.
2
WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013.基于网络的基因集分析工具包(WebGestalt):2013 年更新。
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W77-83. doi: 10.1093/nar/gkt439. Epub 2013 May 23.
3
Adaptive filtering of microarray gene expression data based on Gaussian mixture decomposition.基于高斯混合分解的基因表达数据微阵列自适应滤波。
高维时间数据的特征选择。
BMC Bioinformatics. 2018 Jan 23;19(1):17. doi: 10.1186/s12859-018-2023-7.
4
Robust gene selection methods using weighting schemes for microarray data analysis.用于微阵列数据分析的采用加权方案的稳健基因选择方法。
BMC Bioinformatics. 2017 Sep 2;18(1):389. doi: 10.1186/s12859-017-1810-x.
5
CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests.基于随机森林的用于特征选择和参数优化的CURE-SMOTE算法及混合算法。
BMC Bioinformatics. 2017 Mar 14;18(1):169. doi: 10.1186/s12859-017-1578-z.
6
Gene masking - a technique to improve accuracy for cancer classification with high dimensionality in microarray data.基因屏蔽——一种提高微阵列数据中高维癌症分类准确性的技术。
BMC Med Genomics. 2016 Dec 5;9(Suppl 3):74. doi: 10.1186/s12920-016-0233-2.
7
Minimum redundancy maximum relevance feature selection approach for temporal gene expression data.用于时间基因表达数据的最小冗余最大相关特征选择方法
BMC Bioinformatics. 2017 Jan 3;18(1):9. doi: 10.1186/s12859-016-1423-9.
BMC Bioinformatics. 2013 Mar 20;14:101. doi: 10.1186/1471-2105-14-101.
4
The actin cytoskeleton as a sensor and mediator of apoptosis.肌动蛋白细胞骨架作为细胞凋亡的传感器和介质。
Bioarchitecture. 2012 May 1;2(3):75-87. doi: 10.4161/bioa.20975.
5
Early classification of multivariate temporal observations by extraction of interpretable shapelets.通过提取可解释的形状特征对多元时间观测进行早期分类。
BMC Bioinformatics. 2012 Aug 8;13:195. doi: 10.1186/1471-2105-13-195.
6
Comparative evaluation of set-level techniques in predictive classification of gene expression samples.基于集合水平的技术在基因表达样本预测分类中的比较评估。
BMC Bioinformatics. 2012 Jun 25;13 Suppl 10(Suppl 10):S15. doi: 10.1186/1471-2105-13-S10-S15.
7
A top-r feature selection algorithm for microarray gene expression data.一种用于微阵列基因表达数据的顶级特征选择算法。
IEEE/ACM Trans Comput Biol Bioinform. 2012 May-Jun;9(3):754-64. doi: 10.1109/TCBB.2011.151.
8
Regularization Paths for Generalized Linear Models via Coordinate Descent.基于坐标下降法的广义线性模型正则化路径
J Stat Softw. 2010;33(1):1-22.
9
Beyond clustering of array expressions.超越阵列表达式的聚类。
Int J Bioinform Res Appl. 2009;5(3):329-48. doi: 10.1504/IJBRA.2009.026423.
10
Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources.利用DAVID生物信息学资源对大型基因列表进行系统和综合分析。
Nat Protoc. 2009;4(1):44-57. doi: 10.1038/nprot.2008.211.