结构稀疏正则化分析高维组学数据。

Structured sparsity regularization for analyzing high-dimensional omics data.

机构信息

INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal.

出版信息

Brief Bioinform. 2021 Jan 18;22(1):77-87. doi: 10.1093/bib/bbaa122.

Abstract

The development of new molecular and cell technologies is having a significant impact on the quantity of data generated nowadays. The growth of omics databases is creating a considerable potential for knowledge discovery and, concomitantly, is bringing new challenges to statistical learning and computational biology for health applications. Indeed, the high dimensionality of these data may hamper the use of traditional regression methods and parameter estimation algorithms due to the intrinsic non-identifiability of the inherent optimization problem. Regularized optimization has been rising as a promising and useful strategy to solve these ill-posed problems by imposing additional constraints in the solution parameter space. In particular, the field of statistical learning with sparsity has been significantly contributing to building accurate models that also bring interpretability to biological observations and phenomena. Beyond the now-classic elastic net, one of the best-known methods that combine lasso with ridge penalizations, we briefly overview recent literature on structured regularizers and penalty functions that have been applied in biomedical data to build parsimonious models in a variety of underlying contexts, from survival to generalized linear models. These methods include functions of $\ell _k$-norms and network-based penalties that take into account the inherent relationships between the features. The successful application to omics data illustrates the potential of sparse structured regularization for identifying disease's molecular signatures and for creating high-performance clinical decision support systems towards more personalized healthcare. Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.

摘要

新的分子和细胞技术的发展正在对当今产生的数据量产生重大影响。组学数据库的增长为知识发现创造了巨大的潜力，同时也给健康应用的统计学习和计算生物学带来了新的挑战。事实上，由于内在优化问题的固有不可识别性，这些数据的高维性可能会阻碍传统回归方法和参数估计算法的使用。正则化优化已成为解决这些不适定问题的一种有前途和有用的策略，通过在解参数空间中施加附加约束。特别是，具有稀疏性的统计学习领域为构建准确的模型做出了重大贡献，这些模型也为生物观察和现象带来了可解释性。除了现在经典的弹性网络（elastic net），即组合lasso 和 ridge 惩罚的最佳方法之一，我们简要概述了最近在生物医学数据中应用的结构正则化器和惩罚函数的文献，以在各种基础背景下构建简约模型，从生存到广义线性模型。这些方法包括 $\ell _k$-范数的函数和基于网络的惩罚，这些函数考虑了特征之间的内在关系。这些方法在组学数据中的成功应用说明了稀疏结构正则化在识别疾病分子特征和创建高性能临床决策支持系统以实现更个性化医疗保健方面的潜力。补充信息：补充资料可在Briefings in Bioinformatics 在线获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

结构稀疏正则化分析高维组学数据。

Structured sparsity regularization for analyzing high-dimensional omics data.

机构信息

出版信息

相似文献

引用本文的文献

结构稀疏正则化分析高维组学数据。

Structured sparsity regularization for analyzing high-dimensional omics data.

机构信息

出版信息

相似文献

引用本文的文献