基于非凸正则化的荟萃分析。

Meta-Analysis Based on Nonconvex Regularization.

机构信息

Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Taipa, 999078, Macau.

School of Mathematics, Northwest University, 710127, Xi'an, China.

出版信息

Sci Rep. 2020 Apr 1;10(1):5755. doi: 10.1038/s41598-020-62473-2.

DOI:10.1038/s41598-020-62473-2

PMID:32238826

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7113298/

Abstract

The widespread applications of high-throughput sequencing technology have produced a large number of publicly available gene expression datasets. However, due to the gene expression datasets have the characteristics of small sample size, high dimensionality and high noise, the application of biostatistics and machine learning methods to analyze gene expression data is a challenging task, such as the low reproducibility of important biomarkers in different studies. Meta-analysis is an effective approach to deal with these problems, but the current methods have some limitations. In this paper, we propose the meta-analysis based on three nonconvex regularization methods, which are L regularization (meta-Half), Minimax Concave Penalty regularization (meta-MCP) and Smoothly Clipped Absolute Deviation regularization (meta-SCAD). The three nonconvex regularization methods are effective approaches for variable selection developed in recent years. Through the hierarchical decomposition of coefficients, our methods not only maintain the flexibility of variable selection and improve the efficiency of selecting important biomarkers, but also summarize and synthesize scientific evidence from multiple studies to consider the relationship between different datasets. We give the efficient algorithms and the theoretical property for our methods. Furthermore, we apply our methods to the simulation data and three publicly available lung cancer gene expression datasets, and compare the performance with state-of-the-art methods. Our methods have good performance in simulation studies, and the analysis results on the three publicly available lung cancer gene expression datasets are clinically meaningful. Our methods can also be extended to other areas where datasets are heterogeneous.

摘要

高通量测序技术的广泛应用产生了大量公开可用的基因表达数据集。然而，由于基因表达数据集具有小样本量、高维数和高噪声的特点，因此应用生物统计学和机器学习方法来分析基因表达数据是一项具有挑战性的任务，例如不同研究中重要生物标志物的可重复性低。荟萃分析是处理这些问题的有效方法，但目前的方法存在一些局限性。在本文中，我们提出了基于三种非凸正则化方法的荟萃分析，分别是 L 正则化（meta-Half）、最小极大凹惩罚正则化（meta-MCP）和平滑截尾绝对偏差正则化（meta-SCAD）。这三种非凸正则化方法是近年来开发的有效的变量选择方法。通过系数的层次分解，我们的方法不仅保持了变量选择的灵活性，提高了选择重要生物标志物的效率，而且还综合和综合了来自多个研究的科学证据，以考虑不同数据集之间的关系。我们给出了我们方法的有效算法和理论性质。此外，我们将我们的方法应用于模拟数据和三个公开的肺癌基因表达数据集，并将性能与最先进的方法进行比较。我们的方法在模拟研究中表现良好，并且对三个公开的肺癌基因表达数据集的分析结果具有临床意义。我们的方法还可以扩展到数据集异构的其他领域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d27c/7113298/db2b8938ce5d/41598_2020_62473_Figa_HTML.jpg

相似文献

Meta-Analysis Based on Nonconvex Regularization.基于非凸正则化的荟萃分析。

Sci Rep. 2020 Apr 1;10(1):5755. doi: 10.1038/s41598-020-62473-2.

Multi-view based integrative analysis of gene expression data for identifying biomarkers.基于多视图的基因表达数据综合分析鉴定生物标志物。

Sci Rep. 2019 Sep 18;9(1):13504. doi: 10.1038/s41598-019-49967-4.

COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION.用于非凸惩罚回归的坐标下降算法及其在生物特征选择中的应用

Ann Appl Stat. 2011 Jan 1;5(1):232-253. doi: 10.1214/10-AOAS388.

Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification.基于 L1/2 罚项的稀疏逻辑回归在癌症分类中的基因选择。

BMC Bioinformatics. 2013 Jun 19;14:198. doi: 10.1186/1471-2105-14-198.

Smoothly clipped absolute deviation (SCAD) regularization for compressed sensing MRI using an augmented Lagrangian scheme.基于增广拉格朗日法的压缩感知 MRI 中光滑裁剪绝对偏差（SCAD）正则化。

Magn Reson Imaging. 2013 Oct;31(8):1399-411. doi: 10.1016/j.mri.2013.05.010. Epub 2013 Jul 24.

Novel harmonic regularization approach for variable selection in Cox's proportional hazards model.Cox比例风险模型中用于变量选择的新型谐波正则化方法。

Comput Math Methods Med. 2014;2014:857398. doi: 10.1155/2014/857398. Epub 2014 Nov 24.

Novel Regularization Method for Biomarker Selection and Cancer Classification.新型生物标志物选择和癌症分类正则化方法。

IEEE/ACM Trans Comput Biol Bioinform. 2020 Jul-Aug;17(4):1329-1340. doi: 10.1109/TCBB.2019.2897301. Epub 2019 Feb 4.

Meta-analysis based variable selection for gene expression data.基于荟萃分析的基因表达数据变量选择

Biometrics. 2014 Dec;70(4):872-80. doi: 10.1111/biom.12213. Epub 2014 Sep 5.

A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression.基于数据增强和弹性数据共享套索正则化的基因表达新型元分析。

BMC Bioinformatics. 2022 Aug 23;23(Suppl 10):353. doi: 10.1186/s12859-022-04887-5.

A Semismooth Newton Algorithm for High-Dimensional Nonconvex Sparse Learning.一种用于高维非凸稀疏学习的半光滑牛顿算法。

IEEE Trans Neural Netw Learn Syst. 2020 Aug;31(8):2993-3006. doi: 10.1109/TNNLS.2019.2935001. Epub 2019 Sep 12.

引用本文的文献

BMC Bioinformatics. 2022 Aug 23;23(Suppl 10):353. doi: 10.1186/s12859-022-04887-5.

Meta-Analyzing Multiple Omics Data With Robust Variable Selection.通过稳健变量选择对多组学数据进行Meta分析

Front Genet. 2021 Jul 5;12:656826. doi: 10.3389/fgene.2021.656826. eCollection 2021.

本文引用的文献

Modeling Between-Study Heterogeneity for Improved Replicability in Gene Signature Selection and Clinical Prediction.为提高基因特征选择和临床预测中的可重复性对研究间异质性进行建模

J Am Stat Assoc. 2020;115(531):1125-1138. doi: 10.1080/01621459.2019.1671197. Epub 2019 Oct 29.

BAYESIAN LATENT HIERARCHICAL MODEL FOR TRANSCRIPTOMIC META-ANALYSIS TO DETECT BIOMARKERS WITH CLUSTERED META-PATTERNS OF DIFFERENTIAL EXPRESSION SIGNALS.用于转录组元分析的贝叶斯潜在层次模型，以检测具有差异表达信号聚类元模式的生物标志物。

Ann Appl Stat. 2019 Mar;13(1):340-366. doi: 10.1214/18-AOAS1188. Epub 2019 Apr 10.

Novel Regularization Method for Biomarker Selection and Cancer Classification.新型生物标志物选择和癌症分类正则化方法。

IEEE/ACM Trans Comput Biol Bioinform. 2020 Jul-Aug;17(4):1329-1340. doi: 10.1109/TCBB.2019.2897301. Epub 2019 Feb 4.

Network-based logistic regression integration method for biomarker identification.用于生物标志物识别的基于网络的逻辑回归集成方法。

BMC Syst Biol. 2018 Dec 31;12(Suppl 9):135. doi: 10.1186/s12918-018-0657-8.

SPP1 and AGER as potential prognostic biomarkers for lung adenocarcinoma.SPP1和AGER作为肺腺癌潜在的预后生物标志物。

Oncol Lett. 2018 May;15(5):7028-7036. doi: 10.3892/ol.2018.8235. Epub 2018 Mar 12.

Manifold optimization-based analysis dictionary learning with an ℓ-norm regularizer.基于流形优化的分析字典学习与 ℓ-norm 正则化。

Neural Netw. 2018 Feb;98:212-222. doi: 10.1016/j.neunet.2017.11.015. Epub 2017 Dec 6.

Long non-coding RNA AGER-1 functionally upregulates the innate immunity gene AGER and approximates its anti-tumor effect in lung cancer.长非编码 RNA AGER-1 可功能性地上调先天免疫基因 AGER，并使其近似于肺癌的抗肿瘤作用。

Mol Carcinog. 2018 Mar;57(3):305-318. doi: 10.1002/mc.22756. Epub 2017 Nov 14.

A new semi-supervised learning model combined with Cox and SP-AFT models in cancer survival analysis.一种结合 Cox 和 SP-AFT 模型的新半监督学习模型在癌症生存分析中的应用。

Sci Rep. 2017 Oct 12;7(1):13053. doi: 10.1038/s41598-017-13133-5.

Meta-analytic support vector machine for integrating multiple omics data.用于整合多组学数据的元分析支持向量机

BioData Min. 2017 Jan 26;10:2. doi: 10.1186/s13040-017-0126-8. eCollection 2017.

Gene Coexpression Analyses Differentiate Networks Associated with Diverse Cancers Harboring TP53 Missense or Null Mutations.基因共表达分析区分与携带TP53错义或无效突变的多种癌症相关的网络。

Front Genet. 2016 Aug 3;7:137. doi: 10.3389/fgene.2016.00137. eCollection 2016.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于非凸正则化的荟萃分析。

Meta-Analysis Based on Nonconvex Regularization.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献