一种在相依情况下选择高维数据协变量的统计方法。在肿瘤学中基因图谱分类的应用。

A statistical methodology to select covariates in high-dimensional data under dependence. Application to the classification of genetic profiles in oncology.

作者信息

Bastien B, Boukhobza T, Dumond H, Gégout-Petit A, Muller-Gueudin A, Thiébaut C

机构信息

Transgene S.A., Illkirch-Graffenstaden Cedex, France.

Université de Lorraine, CNRS, CRAN, Nancy, France.

出版信息

J Appl Stat. 2020 Oct 27;49(3):764-781. doi: 10.1080/02664763.2020.1837083. eCollection 2022.

DOI:10.1080/02664763.2020.1837083

PMID:35706767

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9041748/

Abstract

We propose a new methodology for selecting and ranking covariates associated with a variable of interest in a context of high-dimensional data under dependence but few observations. The methodology successively intertwines the clustering of covariates, decorrelation of covariates using Factor Latent Analysis, selection using aggregation of adapted methods and finally ranking. A simulation study shows the interest of the decorrelation inside the different clusters of covariates. We first apply our method to transcriptomic data of 37 patients with advanced non-small-cell lung cancer who have received chemotherapy, to select the transcriptomic covariates that explain the survival outcome of the treatment. Secondly, we apply our method to 79 breast tumor samples to define patient profiles for a new metastatic biomarker and associated gene network in order to personalize the treatments.

摘要

我们提出了一种新方法，用于在数据依赖但观测值较少的高维数据环境中，选择与感兴趣变量相关的协变量并对其进行排序。该方法依次将协变量聚类、使用因子潜在分析对协变量进行去相关、通过适配方法的聚合进行选择并最终进行排序。一项模拟研究表明了在不同协变量簇内进行去相关的意义。我们首先将我们的方法应用于37例接受化疗的晚期非小细胞肺癌患者的转录组数据，以选择解释治疗生存结果的转录组协变量。其次，我们将我们的方法应用于79个乳腺肿瘤样本，以定义一种新的转移生物标志物和相关基因网络的患者特征，从而实现个性化治疗。

相似文献

A statistical methodology to select covariates in high-dimensional data under dependence. Application to the classification of genetic profiles in oncology.一种在相依情况下选择高维数据协变量的统计方法。在肿瘤学中基因图谱分类的应用。

J Appl Stat. 2020 Oct 27;49(3):764-781. doi: 10.1080/02664763.2020.1837083. eCollection 2022.

Epidermal Growth Factor Receptor Mutation (EGFR) Testing for Prediction of Response to EGFR-Targeting Tyrosine Kinase Inhibitor (TKI) Drugs in Patients with Advanced Non-Small-Cell Lung Cancer: An Evidence-Based Analysis.表皮生长因子受体突变（EGFR）检测对晚期非小细胞肺癌患者使用表皮生长因子受体靶向酪氨酸激酶抑制剂（TKI）药物疗效的预测：一项循证分析

Ont Health Technol Assess Ser. 2010;10(24):1-48. Epub 2010 Dec 1.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

SCNrank: spectral clustering for network-based ranking to reveal potential drug targets and its application in pancreatic ductal adenocarcinoma.SCNrank：基于网络的排序的谱聚类揭示潜在的药物靶点及其在胰腺导管腺癌中的应用。

BMC Med Genomics. 2020 Apr 3;13(Suppl 5):50. doi: 10.1186/s12920-020-0681-6.

Adjusting for covariates in evaluating markers for selecting treatment, with application to guiding chemotherapy for treating estrogen-receptor-positive, node-positive breast cancer.在评估用于选择治疗的标志物时对协变量进行调整，并应用于指导雌激素受体阳性、淋巴结阳性乳腺癌的化疗。

Contemp Clin Trials. 2017 Dec;63:30-39. doi: 10.1016/j.cct.2017.08.004. Epub 2017 Aug 14.

KRAS Testing for Anti-EGFR Therapy in Advanced Colorectal Cancer: An Evidence-Based and Economic Analysis.晚期结直肠癌抗表皮生长因子受体治疗的KRAS检测：基于证据的经济分析

Ont Health Technol Assess Ser. 2010;10(25):1-49. Epub 2010 Dec 1.

Correlation-adjusted regression survival scores for high-dimensional variable selection.用于高维变量选择的相关性调整回归生存分数

Stat Med. 2019 Jun 15;38(13):2413-2427. doi: 10.1002/sim.8116. Epub 2019 Feb 22.

Model-free screening for variables with treatment interaction.无模型的治疗交互作用变量筛选。

Stat Methods Med Res. 2022 Oct;31(10):1845-1859. doi: 10.1177/09622802221102624. Epub 2022 May 29.

Ridle for sparse regression with mandatory covariates with application to the genetic assessment of histologic grades of breast cancer.带有强制协变量的稀疏回归难题及其在乳腺癌组织学分级基因评估中的应用

BMC Med Res Methodol. 2017 Jan 25;17(1):12. doi: 10.1186/s12874-017-0291-y.

Identifying high-dimensional biomarkers for personalized medicine via variable importance ranking.通过变量重要性排序识别用于个性化医疗的高维生物标志物。

J Biopharm Stat. 2008;18(5):853-68. doi: 10.1080/10543400802278023.

本文引用的文献

Transcriptional hallmarks of cancer cell lines reveal an emerging role of branched chain amino acid catabolism.癌细胞系的转录特征揭示了支链氨基酸分解代谢的一个新作用。

Sci Rep. 2017 Aug 10;7(1):7820. doi: 10.1038/s41598-017-08329-8.

From ERα66 to ERα36: a generic method for validating a prognosis marker of breast tumor progression.从雌激素受体α66到雌激素受体α36：验证乳腺肿瘤进展预后标志物的通用方法。

BMC Syst Biol. 2015 Jun 17;9:28. doi: 10.1186/s12918-015-0178-7.

Novel multivariate methods for integration of genomics and proteomics data: applications in a kidney transplant rejection study.整合基因组学和蛋白质组学数据的新型多变量方法：在肾移植排斥研究中的应用

OMICS. 2014 Nov;18(11):682-95. doi: 10.1089/omi.2014.0062.

Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems.稀疏偏最小二乘判别分析：用于多类问题的生物学相关特征选择和图形显示。

BMC Bioinformatics. 2011 Jun 22;12:253. doi: 10.1186/1471-2105-12-253.

Micro-RNAs and breast cancer.微小 RNA 与乳腺癌。

Mol Oncol. 2010 Jun;4(3):230-41. doi: 10.1016/j.molonc.2010.04.009. Epub 2010 Apr 28.

Multiple testing. Part I. Single-step procedures for control of general type I error rates.多重检验。第一部分。控制一般I型错误率的单步程序。

Stat Appl Genet Mol Biol. 2004;3:Article13. doi: 10.2202/1544-6115.1040. Epub 2004 Jun 9.

Identification, cloning, and expression of human estrogen receptor-alpha36, a novel variant of human estrogen receptor-alpha66.人雌激素受体α66的新型变体——人雌激素受体α36的鉴定、克隆与表达

Biochem Biophys Res Commun. 2005 Nov 4;336(4):1023-7. doi: 10.1016/j.bbrc.2005.08.226.

Determination of the differentially expressed genes in microarray experiments using local FDR.使用局部错误发现率确定微阵列实验中的差异表达基因。

BMC Bioinformatics. 2004 Sep 6;5:125. doi: 10.1186/1471-2105-5-125.

RankGene: identification of diagnostic genes based on expression data.RankGene：基于表达数据的诊断基因鉴定

Bioinformatics. 2003 Aug 12;19(12):1578-9. doi: 10.1093/bioinformatics/btg179.

Statistical significance for genomewide studies.全基因组研究的统计学显著性

Proc Natl Acad Sci U S A. 2003 Aug 5;100(16):9440-5. doi: 10.1073/pnas.1530509100. Epub 2003 Jul 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。