具有误差控制的有序结局的高维变量选择。

High-dimensional variable selection for ordinal outcomes with error control.

机构信息

Ohio State University.

出版信息

Brief Bioinform. 2021 Jan 18;22(1):334-345. doi: 10.1093/bib/bbaa007.

DOI:10.1093/bib/bbaa007

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7820886/

Abstract

Many high-throughput genomic applications involve a large set of potential covariates and a response which is frequently measured on an ordinal scale, and it is crucial to identify which variables are truly associated with the response. Effectively controlling the false discovery rate (FDR) without sacrificing power has been a major challenge in variable selection research. This study reviews two existing variable selection frameworks, model-X knockoffs and a modified version of reference distribution variable selection (RDVS), both of which utilize artificial variables as benchmarks for decision making. Model-X knockoffs constructs a 'knockoff' variable for each covariate to mimic the covariance structure, while RDVS generates only one null variable and forms a reference distribution by performing multiple runs of model fitting. Herein, we describe how different importance measures for ordinal responses can be constructed that fit into these two selection frameworks, using either penalized regression or machine learning techniques. We compared these measures in terms of the FDR and power using simulated data. Moreover, we applied these two frameworks to high-throughput methylation data for identifying features associated with the progression from normal liver tissue to hepatocellular carcinoma to further compare and contrast their performances.

摘要

许多高通量基因组应用涉及一大组潜在的协变量和一个通常是有序尺度测量的响应，确定哪些变量与响应真正相关是至关重要的。在不牺牲功效的情况下有效地控制假发现率（FDR）一直是变量选择研究中的主要挑战。本研究综述了两种现有的变量选择框架，即模型-X 置换和修改后的参考分布变量选择（RDVS），它们都利用人工变量作为决策的基准。模型-X 置换为每个协变量构建一个“置换”变量来模拟协方差结构，而 RDVS 只生成一个空变量，并通过多次模型拟合来形成参考分布。在此，我们描述了如何使用惩罚回归或机器学习技术，为这两个选择框架构建适合有序响应的不同重要性度量。我们使用模拟数据从 FDR 和功效两方面比较了这些度量。此外，我们将这两个框架应用于高通量甲基化数据，以识别与正常肝组织向肝细胞癌进展相关的特征，从而进一步比较和对比它们的性能。

相似文献

1

High-dimensional variable selection for ordinal outcomes with error control.具有误差控制的有序结局的高维变量选择。

Brief Bioinform. 2021 Jan 18;22(1):334-345. doi: 10.1093/bib/bbaa007.

2

Knockoff boosted tree for model-free variable selection.无模型变量选择的仿射提升树。

Bioinformatics. 2021 May 17;37(7):976-983. doi: 10.1093/bioinformatics/btaa770.

3

False discovery rate-controlled multiple testing for union null hypotheses: a knockoff-based approach.基于置换检验的联合零假设的错误发现率控制多重检验方法。

Biometrics. 2023 Dec;79(4):3497-3509. doi: 10.1111/biom.13848. Epub 2023 Mar 15.

4

Using knockoffs for controlled predictive biomarker identification.使用仿制药进行控制预测生物标志物的鉴定。

Stat Med. 2021 Nov 10;40(25):5453-5473. doi: 10.1002/sim.9134. Epub 2021 Jul 30.

5

Deep direct likelihood knockoffs.深度直接似然性仿样

Adv Neural Inf Process Syst. 2020 Dec;33:5036-5046.

6

Bayesian variable selection using Knockoffs with applications to genomics.使用仿样变量进行贝叶斯变量选择及其在基因组学中的应用。

Comput Stat. 2022 Sep 18:1-20. doi: 10.1007/s00180-022-01283-8.

7

L1 penalized continuation ratio models for ordinal response prediction using high-dimensional datasets.使用高维数据集进行有序响应预测的 L1 惩罚连续比模型。

Stat Med. 2012 Jun 30;31(14):1464-74. doi: 10.1002/sim.4484. Epub 2012 Feb 23.

8

Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents.第1部分. 多种空气污染成分影响的统计学习方法

Res Rep Health Eff Inst. 2015 Jun(183 Pt 1-2):5-50.

9

Local false discovery rate estimation with competition-based procedures for variable selection.基于竞争的变量选择方法的局部错误发现率估计。

Stat Med. 2024 Jan 15;43(1):61-88. doi: 10.1002/sim.9942. Epub 2023 Nov 5.

10

DeepLINK: Deep learning inference using knockoffs with applications to genomics.DeepLINK：使用 Knockoffs 进行深度学习推断及其在基因组学中的应用。

Proc Natl Acad Sci U S A. 2021 Sep 7;118(36). doi: 10.1073/pnas.2104683118.

引用本文的文献

1

Controlled variable selection in Weibull mixture cure models for high-dimensional data.高维数据 Weibull 混合生存模型的控制变量选择。

Stat Med. 2022 Sep 30;41(22):4340-4366. doi: 10.1002/sim.9513. Epub 2022 Jul 6.

2

The Role of Machine Learning in Spine Surgery: The Future Is Now.机器学习在脊柱外科手术中的作用：未来已来。

Front Surg. 2020 Aug 21;7:54. doi: 10.3389/fsurg.2020.00054. eCollection 2020.

本文引用的文献

1

False Discovery Rate Control in Cancer Biomarker Selection Using Knockoffs.使用仿冒品在癌症生物标志物选择中控制错误发现率

Cancers (Basel). 2019 May 29;11(6):744. doi: 10.3390/cancers11060744.

2

Gene hunting with hidden Markov model knockoffs.使用隐马尔可夫模型仿样进行基因搜寻。

Biometrika. 2019 Mar;106(1):1-18. doi: 10.1093/biomet/asy033. Epub 2018 Aug 4.

3

Investigating the mechanism of hepatocellular carcinoma progression by constructing genetic and epigenetic networks using NGS data identification and big database mining method.利用二代测序（NGS）数据识别和大数据挖掘方法构建遗传和表观遗传网络，研究肝细胞癌进展的机制。

Oncotarget. 2016 Nov 29;7(48):79453-79473. doi: 10.18632/oncotarget.13100.

4

YAP Subcellular Localization and Hippo Pathway Transcriptome Analysis in Pediatric Hepatocellular Carcinoma.小儿肝细胞癌中YAP亚细胞定位及Hippo信号通路转录组分析

Sci Rep. 2016 Sep 8;6:30238. doi: 10.1038/srep30238.

5

A Fuzzy Permutation Method for False Discovery Rate Control.一种用于控制假发现率的模糊排列方法。

Sci Rep. 2016 Jun 22;6:28507. doi: 10.1038/srep28507.

6

mTOR inhibitors induce apoptosis in colon cancer cells via CHOP-dependent DR5 induction on 4E-BP1 dephosphorylation.雷帕霉素靶蛋白抑制剂通过依赖CHOP的DR5诱导作用，在4E-BP1去磷酸化过程中诱导结肠癌细胞凋亡。

Oncogene. 2016 Jan 14;35(2):148-57. doi: 10.1038/onc.2015.79. Epub 2015 Apr 13.

7

Blocking autophagy enhances the apoptosis effect of bufalin on human hepatocellular carcinoma cells through endoplasmic reticulum stress and JNK activation.阻断自噬通过内质网应激和JNK激活增强了蟾毒灵对人肝癌细胞的凋亡作用。

Apoptosis. 2014 Jan;19(1):210-23. doi: 10.1007/s10495-013-0914-7.

8

Distinct cytoplasmic and nuclear functions of the stress induced protein DDIT3/CHOP/GADD153.应激诱导蛋白 DDIT3/CHOP/GADD153 具有不同的细胞质和核功能。

PLoS One. 2012;7(4):e33208. doi: 10.1371/journal.pone.0033208. Epub 2012 Apr 9.

9

L1 penalized continuation ratio models for ordinal response prediction using high-dimensional datasets.使用高维数据集进行有序响应预测的 L1 惩罚连续比模型。

Stat Med. 2012 Jun 30;31(14):1464-74. doi: 10.1002/sim.4484. Epub 2012 Feb 23.

10

Genetic polymorphism of interleukin-16 influences susceptibility to HBV-related hepatocellular carcinoma in a Chinese population.白细胞介素-16 遗传多态性影响中国人群乙型肝炎病毒相关肝细胞癌的易感性。

Infect Genet Evol. 2011 Dec;11(8):2083-8. doi: 10.1016/j.meegid.2011.09.025. Epub 2011 Oct 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验