用于流感病毒数据中任意阶交互抗原位点识别的广义分层稀疏模型

Generalized Hierarchical Sparse Model for Arbitrary-Order Interactive Antigenic Sites Identification in Flu Virus Data.

作者信息

Han Lei, Zhang Yu, Wan Xiu-Feng, Zhang Tong

机构信息

Department of Statistics, Rutgers University.

Department of Computer Science and Engineering, Hong Kong University of Science and Technology.

出版信息

KDD. 2016 Aug;2016:865-874. doi: 10.1145/2939672.2939786.

DOI:10.1145/2939672.2939786

PMID:28392970

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5382970/

Abstract

Recent statistical evidence has shown that a regression model by incorporating the interactions among the original covariates/features can significantly improve the interpretability for biological data. One major challenge is the exponentially expanded feature space when adding high-order feature interactions to the model. To tackle the huge dimensionality, hierarchical sparse models (HSM) are developed by enforcing sparsity under heredity structures in the interactions among the covariates. However, existing methods only consider pairwise interactions, making the discovery of important high-order interactions a non-trivial open problem. In this paper, we propose a generalized hierarchical sparse model (GHSM) as a generalization of the HSM models to tackle arbitrary-order interactions. The GHSM applies the ℓ penalty to all the model coefficients under a constraint that given any covariate, if none of its associated th-order interactions contribute to the regression model, then neither do its associated higher-order interactions. The resulting objective function is non-convex with a challenge lying in the coupled variables appearing in the arbitrary-order hierarchical constraints and we devise an efficient optimization algorithm to directly solve it. Specifically, we decouple the variables in the constraints via both the general iterative shrinkage and thresholding (GIST) and the alternating direction method of multipliers (ADMM) methods into three subproblems, each of which is proved to admit an efficiently analytical solution. We evaluate the GHSM method in both synthetic problem and the antigenic sites identification problem for the influenza virus data, where we expand the feature space up to the 5th-order interactions. Empirical results demonstrate the effectiveness and efficiency of the proposed methods and the learned high-order interactions have meaningful synergistic covariate patterns in the influenza virus antigenicity.

摘要

最近的统计证据表明，通过纳入原始协变量/特征之间的相互作用构建的回归模型可以显著提高生物数据的可解释性。一个主要挑战是在模型中添加高阶特征相互作用时特征空间呈指数级扩展。为了解决巨大的维度问题，通过在协变量之间的相互作用的遗传结构下强制稀疏性，开发了分层稀疏模型（HSM）。然而，现有方法仅考虑成对相互作用，使得发现重要的高阶相互作用成为一个具有挑战性的开放问题。在本文中，我们提出了一种广义分层稀疏模型（GHSM）作为HSM模型的推广，以处理任意阶相互作用。GHSM在一个约束条件下对所有模型系数应用ℓ惩罚，该约束条件是给定任何协变量，如果其相关的第阶相互作用都对回归模型没有贡献，那么其相关的高阶相互作用也不会有贡献。由此产生的目标函数是非凸的，挑战在于任意阶分层约束中出现的耦合变量，我们设计了一种有效的优化算法来直接求解它。具体来说，我们通过一般迭代收缩和阈值化（GIST）以及乘子交替方向法（ADMM）将约束中的变量解耦为三个子问题，每个子问题都被证明可以得到有效的解析解。我们在合成问题和流感病毒数据的抗原位点识别问题中评估了GHSM方法，在这些问题中我们将特征空间扩展到了五阶相互作用。实证结果证明了所提出方法的有效性和效率，并且所学习到的高阶相互作用在流感病毒抗原性方面具有有意义的协同协变量模式。

相似文献

Generalized Hierarchical Sparse Model for Arbitrary-Order Interactive Antigenic Sites Identification in Flu Virus Data.

KDD. 2016 Aug;2016:865-874. doi: 10.1145/2939672.2939786.

Convex Modeling of Interactions with Strong Heredity.

J Comput Graph Stat. 2016;25(4):981-1004. doi: 10.1080/10618600.2015.1067217. Epub 2015 Aug 12.

A General Iterative Shrinkage and Thresholding Algorithm for Non-convex Regularized Optimization Problems.

JMLR Workshop Conf Proc. 2013;28(2):37-45.

ADMMBO: Bayesian Optimization with Unknown Constraints using ADMM.

J Mach Learn Res. 2019;20.

[Generalized interaction LASSO based on alternating direction method of multipliers for liver disease classification].

Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2017 Jun 1;34(3):350-356. doi: 10.7507/1001-5515.201508026.

Feature Selection With $\ell_{2,1-2}$ Regularization.

IEEE Trans Neural Netw Learn Syst. 2018 Oct;29(10):4967-4982. doi: 10.1109/TNNLS.2017.2785403. Epub 2018 Jan 15.

Sparse deconvolution of higher order tensor for fiber orientation distribution estimation.

Artif Intell Med. 2015 Nov;65(3):229-38. doi: 10.1016/j.artmed.2015.09.004. Epub 2015 Sep 15.

Sparse Nonparametric Regression With Regularized Tensor Product Kernel.

Stat (Int Stat Inst). 2020;9(1). doi: 10.1002/sta4.300. Epub 2020 Jul 6.

Efficient sparse generalized multiple kernel learning.

IEEE Trans Neural Netw. 2011 Mar;22(3):433-46. doi: 10.1109/TNN.2010.2103571. Epub 2011 Jan 20.

Periodical sparse low-rank matrix estimation algorithm for fault detection of rolling bearings.

ISA Trans. 2020 Jun;101:366-378. doi: 10.1016/j.isatra.2020.01.037. Epub 2020 Feb 3.

引用本文的文献

MAIVeSS: streamlined selection of antigenically matched, high-yield viruses for seasonal influenza vaccine production.

Nat Commun. 2024 Feb 6;15(1):1128. doi: 10.1038/s41467-024-45145-x.

Antigenic characterization of influenza and SARS-CoV-2 viruses.

Anal Bioanal Chem. 2022 Apr;414(9):2841-2881. doi: 10.1007/s00216-021-03806-6. Epub 2021 Dec 14.

Triple reassortment increases compatibility among viral ribonucleoprotein genes of contemporary avian and human influenza A viruses.

PLoS Pathog. 2021 Oct 7;17(10):e1009962. doi: 10.1371/journal.ppat.1009962. eCollection 2021 Oct.

Variations outside the conserved motifs of PB1 catalytic active site may affect replication efficiency of the RNP complex of influenza A virus.

Virology. 2021 Jul;559:145-155. doi: 10.1016/j.virol.2021.04.001. Epub 2021 Apr 9.

Multi-task learning sparse group lasso: a method for quantifying antigenicity of influenza A(H1N1) virus using mutations and variations in glycosylation of Hemagglutinin.

BMC Bioinformatics. 2020 May 11;21(1):182. doi: 10.1186/s12859-020-3527-5.

Graph-guided multi-task sparse learning model: a method for identifying antigenic variants of influenza A(H3N2) virus.

Bioinformatics. 2019 Jan 1;35(1):77-87. doi: 10.1093/bioinformatics/bty457.

本文引用的文献

Convex Modeling of Interactions with Strong Heredity.

J Comput Graph Stat. 2016;25(4):981-1004. doi: 10.1080/10618600.2015.1067217. Epub 2015 Aug 12.

Learning interactions via hierarchical group-lasso regularization.

J Comput Graph Stat. 2015;24(3):627-654. doi: 10.1080/10618600.2014.938812. Epub 2015 Sep 16.

A LASSO FOR HIERARCHICAL INTERACTIONS.

Ann Stat. 2013 Jun;41(3):1111-1141. doi: 10.1214/13-AOS1096.

A General Iterative Shrinkage and Thresholding Algorithm for Non-convex Regularized Optimization Problems.

JMLR Workshop Conf Proc. 2013;28(2):37-45.

Sequence-based antigenic change prediction by a sparse learning method incorporating co-evolutionary information.

PLoS One. 2014 Sep 4;9(9):e106660. doi: 10.1371/journal.pone.0106660. eCollection 2014.

Using sequence data to infer the antigenicity of influenza virus.

mBio. 2013 Jul 2;4(4):e00230-13. doi: 10.1128/mBio.00230-13.

Co-evolution positions and rules for antigenic variants of human influenza A/H3N2 viruses.

BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S41. doi: 10.1186/1471-2105-10-S1-S41.

Simultaneous amino acid substitutions at antigenic sites drive influenza A hemagglutinin evolution.

Proc Natl Acad Sci U S A. 2007 Apr 10;104(15):6283-8. doi: 10.1073/pnas.0701396104. Epub 2007 Mar 29.

Probing three-way interactions in moderated multiple regression: development and application of a slope difference test.

J Appl Psychol. 2006 Jul;91(4):917-26. doi: 10.1037/0021-9010.91.4.917.

Gene-environment interaction analysis of serotonin system markers with adolescent depression.

Mol Psychiatry. 2004 Oct;9(10):908-15. doi: 10.1038/sj.mp.4001546.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于流感病毒数据中任意阶交互抗原位点识别的广义分层稀疏模型

Generalized Hierarchical Sparse Model for Arbitrary-Order Interactive Antigenic Sites Identification in Flu Virus Data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献