Suppr超能文献

PUlasso:仅存在数据下的高维变量选择

PUlasso: High-Dimensional Variable Selection With Presence-Only Data.

作者信息

Song Hyebin, Raskutti Garvesh

机构信息

Department of Statistics, University of Wisconsin-Madison, Madison, WI.

出版信息

J Am Stat Assoc. 2019;115(529):334-347. doi: 10.1080/01621459.2018.1546587. Epub 2019 Apr 11.

Abstract

In various real-world problems, we are presented with classification problems with , referred to as presence-only responses. In this article we study variable selection in the context of presence only responses where the number of features or covariates is large. The combination of and presents both statistical and computational challenges. In this article, we develop the algorithm for variable selection and classification with positive and unlabeled responses. Our algorithm involves using the majorization-minimization framework which is a generalization of the well-known expectation-maximization (EM) algorithm. In particular to make our algorithm scalable, we provide two computational speed-ups to the standard EM algorithm. We provide a theoretical guarantee where we first show that our algorithm converges to a stationary point, and then prove that any stationary point within a local neighborhood of the true parameter achieves the minimax optimal mean-squared error under both strict sparsity and group sparsity assumptions. We also demonstrate through simulations that our algorithm outperforms state-of-the-art algorithms in the moderate settings in terms of classification performance. Finally, we demonstrate that our PUlasso algorithm performs well on a biochemistry example. Supplementary materials for this article are available online.

摘要

在各种实际问题中,我们会遇到分类问题,其响应仅表示为存在,即所谓的仅存在响应。在本文中,我们研究在仅存在响应的背景下进行变量选择,其中特征或协变量的数量很大。特征数量大与仅存在响应的结合带来了统计和计算方面的挑战。在本文中,我们开发了用于具有正例和未标记响应的变量选择与分类的算法。我们的算法涉及使用主元化-最小化框架,该框架是著名的期望最大化(EM)算法的推广。特别是为了使我们的算法具有可扩展性,我们为标准EM算法提供了两种计算加速方法。我们提供了理论保证,首先表明我们的算法收敛到一个驻点,然后证明在严格稀疏性和组稀疏性假设下,真实参数局部邻域内的任何驻点都能达到极小极大最优均方误差。我们还通过模拟证明,在中等设置下,我们的算法在分类性能方面优于现有算法。最后,我们证明我们的PUlasso算法在一个生物化学示例上表现良好。本文的补充材料可在线获取。

相似文献

1
PUlasso: High-Dimensional Variable Selection With Presence-Only Data.PUlasso:仅存在数据下的高维变量选择
J Am Stat Assoc. 2019;115(529):334-347. doi: 10.1080/01621459.2018.1546587. Epub 2019 Apr 11.
2
A Semismooth Newton Algorithm for High-Dimensional Nonconvex Sparse Learning.一种用于高维非凸稀疏学习的半光滑牛顿算法。
IEEE Trans Neural Netw Learn Syst. 2020 Aug;31(8):2993-3006. doi: 10.1109/TNNLS.2019.2935001. Epub 2019 Sep 12.
8
Efficient Training for Positive Unlabeled Learning.正例无标注学习的高效训练
IEEE Trans Pattern Anal Mach Intell. 2019 Nov;41(11):2584-2598. doi: 10.1109/TPAMI.2018.2860995. Epub 2018 Jul 30.
10
An Online Minimax Optimal Algorithm for Adversarial Multiarmed Bandit Problem.一种用于对抗性多臂老虎机问题的在线极小极大最优算法。
IEEE Trans Neural Netw Learn Syst. 2018 Nov;29(11):5565-5580. doi: 10.1109/TNNLS.2018.2806006. Epub 2018 Mar 8.

引用本文的文献

3
PLUS: Predicting cancer metastasis potential based on positive and unlabeled learning.PLUS:基于阳性和无标签学习预测癌症转移潜能。
PLoS Comput Biol. 2022 Mar 29;18(3):e1009956. doi: 10.1371/journal.pcbi.1009956. eCollection 2022 Mar.
7
Bayesian Neural Networks for Selection of Drug Sensitive Genes.用于选择药物敏感基因的贝叶斯神经网络
J Am Stat Assoc. 2018;113(523):955-972. doi: 10.1080/01621459.2017.1409122. Epub 2018 Jun 28.

本文引用的文献

1
STANDARDIZATION AND THE GROUP LASSO PENALTY.标准化与组套索惩罚
Stat Sin. 2012 Jul;22(3):983-1001. doi: 10.5705/ss.2011.075.
2
Dissecting enzyme function with microfluidic-based deep mutational scanning.利用基于微流控的深度突变扫描剖析酶功能。
Proc Natl Acad Sci U S A. 2015 Jun 9;112(23):7159-64. doi: 10.1073/pnas.1422285112. Epub 2015 May 26.
4
Strong rules for discarding predictors in lasso-type problems.在套索型问题中舍弃预测变量的严格规则。
J R Stat Soc Series B Stat Methodol. 2012 Mar;74(2):245-266. doi: 10.1111/j.1467-9868.2011.01004.x.
8
Experimental illumination of a fitness landscape.实验照亮适应度景观。
Proc Natl Acad Sci U S A. 2011 May 10;108(19):7896-901. doi: 10.1073/pnas.1016024108. Epub 2011 Apr 4.
10
Presence-only data and the em algorithm.仅存在数据与期望最大化算法
Biometrics. 2009 Jun;65(2):554-63. doi: 10.1111/j.1541-0420.2008.01116.x.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验