• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

系统生物学中核方法的特征选择

Feature selection for kernel methods in systems biology.

作者信息

Brouard Céline, Mariette Jérôme, Flamary Rémi, Vialaneix Nathalie

机构信息

Université de Toulouse, INRAE, UR MIAT, F-31320, Castanet-Tolosan, France.

École Polytechnique, CMAP, F-91120, Palaiseau, France.

出版信息

NAR Genom Bioinform. 2022 Mar 7;4(1):lqac014. doi: 10.1093/nargab/lqac014. eCollection 2022 Mar.

DOI:10.1093/nargab/lqac014
PMID:35265835
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8900155/
Abstract

The substantial development of high-throughput biotechnologies has rendered large-scale multi-omics datasets increasingly available. New challenges have emerged to process and integrate this large volume of information, often obtained from widely heterogeneous sources. Kernel methods have proven successful to handle the analysis of different types of datasets obtained on the same individuals. However, they usually suffer from a lack of interpretability since the original description of the individuals is lost due to the kernel embedding. We propose novel feature selection methods that are adapted to the kernel framework and go beyond the well-established work in supervised learning by addressing the more difficult tasks of unsupervised learning and kernel output learning. The method is expressed under the form of a non-convex optimization problem with a ℓ penalty, which is solved with a proximal gradient descent approach. It is tested on several systems biology datasets and shows good performances in selecting relevant and less redundant features compared to existing alternatives. It also proved relevant for identifying important governmental measures best explaining the time series of Covid-19 reproducing number evolution during the first months of 2020. The proposed feature selection method is embedded in the R package mixKernel version 0.8, published on CRAN. Installation instructions are available at http://mixkernel.clementine.wf/.

摘要

高通量生物技术的显著发展使得大规模多组学数据集越来越容易获得。处理和整合这些通常从广泛异质来源获取的大量信息出现了新的挑战。核方法已被证明在处理对同一批个体获得的不同类型数据集的分析方面是成功的。然而,它们通常缺乏可解释性,因为由于核嵌入,个体的原始描述丢失了。我们提出了新颖的特征选择方法,这些方法适用于核框架,并且通过解决无监督学习和核输出学习等更困难的任务,超越了监督学习中已有的工作。该方法以带有ℓ惩罚的非凸优化问题的形式表示,并使用近端梯度下降法求解。它在几个系统生物学数据集上进行了测试,与现有方法相比,在选择相关且冗余度较低的特征方面表现出良好的性能。它还被证明对于识别最能解释2020年头几个月新冠病毒再生数演变时间序列的重要政府措施是相关的。所提出的特征选择方法嵌入在CRAN上发布的R包mixKernel版本0.8中。安装说明可在http://mixkernel.clementine.wf/获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c31/8900155/548b0758a6bd/lqac014fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c31/8900155/65e657c34d11/lqac014fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c31/8900155/0dc132e4fd89/lqac014fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c31/8900155/31212a2a1f53/lqac014fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c31/8900155/2b9cbdd4eb55/lqac014fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c31/8900155/754d10929d91/lqac014fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c31/8900155/1025f36f9c59/lqac014fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c31/8900155/548b0758a6bd/lqac014fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c31/8900155/65e657c34d11/lqac014fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c31/8900155/0dc132e4fd89/lqac014fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c31/8900155/31212a2a1f53/lqac014fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c31/8900155/2b9cbdd4eb55/lqac014fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c31/8900155/754d10929d91/lqac014fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c31/8900155/1025f36f9c59/lqac014fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c31/8900155/548b0758a6bd/lqac014fig7.jpg

相似文献

1
Feature selection for kernel methods in systems biology.系统生物学中核方法的特征选择
NAR Genom Bioinform. 2022 Mar 7;4(1):lqac014. doi: 10.1093/nargab/lqac014. eCollection 2022 Mar.
2
Unsupervised multiple kernel learning for heterogeneous data integration.无监督多内核学习在异类数据集成中的应用。
Bioinformatics. 2018 Mar 15;34(6):1009-1015. doi: 10.1093/bioinformatics/btx682.
3
A general framework of nonparametric feature selection in high-dimensional data.高维数据中非参数特征选择的一般框架。
Biometrics. 2023 Jun;79(2):951-963. doi: 10.1111/biom.13664. Epub 2022 Apr 7.
4
Novel feature selection method via kernel tensor decomposition for improved multi-omics data analysis.基于核张量分解的新型特征选择方法,用于改进多组学数据分析。
BMC Med Genomics. 2022 Feb 24;15(1):37. doi: 10.1186/s12920-022-01181-4.
5
A feature selection method based on multiple kernel learning with expression profiles of different types.一种基于多内核学习和不同类型表达谱的特征选择方法。
BioData Min. 2017 Feb 2;10:4. doi: 10.1186/s13040-017-0124-x. eCollection 2017.
6
A ℓ norm regularized multi-kernel learning for false positive reduction in Lung nodule CAD.用于减少肺结节计算机辅助检测中假阳性的ℓ范数正则化多核学习
Comput Methods Programs Biomed. 2017 Mar;140:211-231. doi: 10.1016/j.cmpb.2016.12.007. Epub 2016 Dec 15.
7
Fast Gaussian kernel learning for classification tasks based on specially structured global optimization.基于特殊结构全局优化的分类任务快速高斯核学习。
Neural Netw. 2014 Sep;57:51-62. doi: 10.1016/j.neunet.2014.05.014. Epub 2014 Jun 2.
8
Convolutional sparse kernel network for unsupervised medical image analysis.卷积稀疏核网络在医学图像无监督分析中的应用。
Med Image Anal. 2019 Aug;56:140-151. doi: 10.1016/j.media.2019.06.005. Epub 2019 Jun 12.
9
Feature Space Independent Semi-Supervised Domain Adaptation via Kernel Matching.基于核匹配的特征空间独立半监督域自适应。
IEEE Trans Pattern Anal Mach Intell. 2015 Jan;37(1):54-66. doi: 10.1109/TPAMI.2014.2343216.
10
Improvement of variables interpretability in kernel PCA.核主成分分析中变量可解释性的改进。
BMC Bioinformatics. 2023 Jul 12;24(1):282. doi: 10.1186/s12859-023-05404-y.

引用本文的文献

1
Asterics: a simple tool for the ExploRation and Integration of omiCS data.Asterics:一种用于探索和整合组学数据的简单工具。
BMC Bioinformatics. 2023 Oct 18;24(1):391. doi: 10.1186/s12859-023-05504-9.
2
A primer on correlation-based dimension reduction methods for multi-omics analysis.基于相关性的多维数据分析方法概论。
J R Soc Interface. 2023 Oct;20(207):20230344. doi: 10.1098/rsif.2023.0344. Epub 2023 Oct 11.
3
Improvement of variables interpretability in kernel PCA.核主成分分析中变量可解释性的改进。

本文引用的文献

1
Ranking the effectiveness of worldwide COVID-19 government interventions.对全球 COVID-19 政府干预措施的效果进行排名。
Nat Hum Behav. 2020 Dec;4(12):1303-1312. doi: 10.1038/s41562-020-01009-0. Epub 2020 Nov 16.
2
Random forests for high-dimensional longitudinal data.随机森林在高维纵向数据中的应用。
Stat Methods Med Res. 2021 Jan;30(1):166-184. doi: 10.1177/0962280220946080. Epub 2020 Aug 9.
3
Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data.Block HSIC Lasso:超高维数据的无模型生物标志物检测
BMC Bioinformatics. 2023 Jul 12;24(1):282. doi: 10.1186/s12859-023-05404-y.
Bioinformatics. 2019 Jul 15;35(14):i427-i435. doi: 10.1093/bioinformatics/btz333.
4
Biological sequence modeling with convolutional kernel networks.基于卷积核网络的生物序列建模。
Bioinformatics. 2019 Sep 15;35(18):3294-3302. doi: 10.1093/bioinformatics/btz094.
5
A variable selection approach in the multivariate linear model: an application to LC-MS metabolomics data.多元线性模型中的变量选择方法:在液相色谱 - 质谱代谢组学数据中的应用
Stat Appl Genet Mol Biol. 2018 Sep 8;17(5):/j/sagmb.2018.17.issue-5/sagmb-2017-0077/sagmb-2017-0077.xml. doi: 10.1515/sagmb-2017-0077.
6
Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees.主成分分析与系统发育树空间中弗雷歇均值的轨迹
Biometrika. 2017 Dec;104(4):901-922. doi: 10.1093/biomet/asx047. Epub 2017 Sep 27.
7
Protein quantitative trait locus study in obesity during weight-loss identifies a leptin regulator.在减肥期间的肥胖症中的蛋白质数量性状位点研究鉴定出一种瘦素调节剂。
Nat Commun. 2017 Dec 12;8(1):2084. doi: 10.1038/s41467-017-02182-z.
8
mixOmics: An R package for 'omics feature selection and multiple data integration.mixOmics:一个用于“组学”特征选择和多数据整合的R包。
PLoS Comput Biol. 2017 Nov 3;13(11):e1005752. doi: 10.1371/journal.pcbi.1005752. eCollection 2017 Nov.
9
Unsupervised multiple kernel learning for heterogeneous data integration.无监督多内核学习在异类数据集成中的应用。
Bioinformatics. 2018 Mar 15;34(6):1009-1015. doi: 10.1093/bioinformatics/btx682.
10
Transcriptome profiling from adipose tissue during a low-calorie diet reveals predictors of weight and glycemic outcomes in obese, nondiabetic subjects.低热量饮食期间脂肪组织的转录组谱分析揭示了肥胖非糖尿病患者体重和血糖结局的预测因子。
Am J Clin Nutr. 2017 Sep;106(3):736-746. doi: 10.3945/ajcn.117.156216. Epub 2017 Aug 9.