确定在开发和验证预测模型中变量的相对重要性。

Determining relative importance of variables in developing and validating predictive models.

机构信息

Child Heath Evaluative Sciences, The Hospital for Sick Children, Toronto, Canada.

出版信息

BMC Med Res Methodol. 2009 Sep 14;9:64. doi: 10.1186/1471-2288-9-64.

DOI:10.1186/1471-2288-9-64

PMID:19751506

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2761416/

Abstract

BACKGROUND

Multiple regression models are used in a wide range of scientific disciplines and automated model selection procedures are frequently used to identify independent predictors. However, determination of relative importance of potential predictors and validating the fitted models for their stability, predictive accuracy and generalizability are often overlooked or not done thoroughly.

METHODS

Using a case study aimed at predicting children with acute lymphoblastic leukemia (ALL) who are at low risk of Tumor Lysis Syndrome (TLS), we propose and compare two strategies, bootstrapping and random split of data, for ordering potential predictors according to their relative importance with respect to model stability and generalizability. We also propose an approach based on relative increase in percentage of explained variation and area under the Receiver Operating Characteristic (ROC) curve for developing models where variables from our ordered list enter the model according to their importance. An additional data set aimed at identifying predictors of prostate cancer penetration is also used for illustrative purposes.

RESULTS

Age is chosen to be the most important predictor of TLS. It is selected 100% of the time using the bootstrapping approach. Using the random split method, it is selected 99% of the time in the training data and is significant (at 5% level) 98% of the time in the validation data set. This indicates that age is a stable predictor of TLS with good generalizability. The second most important variable is white blood cell count (WBC). Our methods also identified an important predictor of TLS that was otherwise omitted if relying on any of the automated model selection procedures alone. A group at low risk of TLS consists of children younger than 10 years of age, without T-cell immunophenotype, whose baseline WBC is < 20 x 10(9)/L and palpable spleen is < 2 cm. For the prostate cancer data set, the Gleason score and digital rectal exam are identified to be the most important indicators of whether tumor has penetrated the prostate capsule.

CONCLUSION

Our model selection procedures based on bootstrap re-sampling and repeated random split techniques can be used to assess the strength of evidence that a variable is truly an independent and reproducible predictor. Our methods, therefore, can be used for developing stable and reproducible models with good performances. Moreover, our methods can serve as a good tool for validating a predictive model. Previous biological and clinical studies support the findings based on our selection and validation strategies. However, extensive simulations may be required to assess the performance of our methods under different scenarios as well as check their sensitivity to a random fluctuation in the data.

摘要

背景

多元回归模型广泛应用于各个科学领域，自动模型选择程序常用于识别独立预测因子。然而，确定潜在预测因子的相对重要性，并验证模型的稳定性、预测准确性和可推广性，往往被忽视或没有得到彻底验证。

方法

本研究以预测急性淋巴细胞白血病（ALL）患儿发生肿瘤溶解综合征（TLS）风险为案例，我们提出并比较了两种策略，即 bootstrap 重抽样和数据随机分割，以根据模型稳定性和可推广性来评估潜在预测因子的相对重要性。我们还提出了一种方法，基于解释变异百分比和接收者操作特征（ROC）曲线下面积的相对增加，来构建模型，根据变量的重要性，按顺序将变量纳入模型。为了说明问题，我们还使用了另一个旨在识别前列腺癌穿透性预测因子的数据集。

结果

年龄被选为 TLS 的最重要预测因子。在 bootstrap 方法中，年龄 100%被选中。使用随机分割方法，在训练数据中 99%的时间选择年龄，在验证数据集中 98%的时间是显著的（在 5%的水平）。这表明年龄是一个稳定的 TLS 预测因子，具有良好的可推广性。第二重要的变量是白细胞计数（WBC）。我们的方法还确定了一个重要的 TLS 预测因子，如果仅依赖于任何自动模型选择程序，这个预测因子可能会被忽略。TLS 低危组包括 10 岁以下、无 T 细胞免疫表型、基线白细胞计数<20×10(9)/L 和可触及脾脏<2cm 的儿童。对于前列腺癌数据集，Gleason 评分和直肠指检被确定为肿瘤是否穿透前列腺包膜的最重要指标。

结论

我们基于 bootstrap 重抽样和重复随机分割技术的模型选择程序，可以用于评估一个变量是否真正是一个独立和可重复的预测因子的证据强度。因此，我们的方法可用于开发具有良好性能的稳定且可重复的模型。此外，我们的方法可以作为验证预测模型的有效工具。先前的生物学和临床研究支持了我们基于选择和验证策略的发现。然而，可能需要进行广泛的模拟，以评估我们的方法在不同情况下的性能，并检查其对数据随机波动的敏感性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/243f/2761416/04a4c66387d5/1471-2288-9-64-1.jpg

相似文献

Determining relative importance of variables in developing and validating predictive models.

BMC Med Res Methodol. 2009 Sep 14;9:64. doi: 10.1186/1471-2288-9-64.

Features at presentation predict children with acute lymphoblastic leukemia at low risk for tumor lysis syndrome.

Cancer. 2007 Oct 15;110(8):1832-9. doi: 10.1002/cncr.22990.

Tumor Lysis Syndrome Is Associated with Worse Outcomes in Adult Patients with Acute Lymphoblastic Leukemia.

Acta Haematol. 2024;147(4):391-401. doi: 10.1159/000534453. Epub 2023 Nov 14.

Clinical characteristics of tumor lysis syndrome in childhood acute lymphoblastic leukemia.

Sci Rep. 2021 May 6;11(1):9656. doi: 10.1038/s41598-021-88912-2.

Tumor Lysis Syndrome and Hyperleukocytosis in Childhood Acute Lymphoblastic Leukemia in a Tertiary Care Hospital.

Mymensingh Med J. 2017 Oct;26(4):906-912.

Clinical versus laboratory tumor lysis syndrome in children with acute leukemia.

Pediatr Hematol Oncol. 1995 Mar-Apr;12(2):129-34. doi: 10.3109/08880019509029545.

L-asparginase administration reduces white blood cell count and prevents tumor lysis syndrome in children with hyperleukocytic acute lymphoblastic leukemia.

Acta Haematol. 2015;133(1):6-9. doi: 10.1159/000358115. Epub 2014 Jun 26.

Risk-based management strategy and outcomes of tumor lysis syndrome in children with leukemia/lymphoma: Analysis from a resource-limited setting.

Pediatr Blood Cancer. 2018 Dec;65(12):e27401. doi: 10.1002/pbc.27401. Epub 2018 Aug 12.

A pharmacokinetic/pharmacodynamic model of tumor lysis syndrome in chronic lymphocytic leukemia patients treated with flavopiridol.

Clin Cancer Res. 2013 Mar 1;19(5):1269-80. doi: 10.1158/1078-0432.CCR-12-1092. Epub 2013 Jan 8.

Tumor lysis syndrome in children with non-Hodgkin lymphoma.

Pediatr Hematol Oncol. 2006 Jan-Feb;23(1):65-70. doi: 10.1080/08880010500313561.

引用本文的文献

Integrated artificial intelligence in healthcare and the patient's experience of care.

Sci Rep. 2025 Jul 1;15(1):21879. doi: 10.1038/s41598-025-07581-7.

Prolactin deficiency drives diabetes-associated cognitive dysfunction by inducing microglia-mediated synaptic loss.

J Neuroinflammation. 2024 Nov 14;21(1):295. doi: 10.1186/s12974-024-03289-z.

Impact of discharge checklist on guideline-directed medical therapy and mid-term prognosis in heart failure.

Korean J Intern Med. 2024 Nov;39(6):945-956. doi: 10.3904/kjim.2024.088. Epub 2024 Oct 24.

Emergency Department Volume and Delayed Diagnosis of Pediatric Appendicitis: A Retrospective Cohort Study.

Ann Surg. 2023 Dec 1;278(6):833-838. doi: 10.1097/SLA.0000000000005972. Epub 2023 Jun 30.

Gene Screening in High-Throughput Right-Censored Lung Cancer Data.

Onco (Basel). 2022 Dec;2(4):305-318. doi: 10.3390/onco2040017. Epub 2022 Oct 17.

Prognostic Modelling Studies of Coronary Heart Disease-A Systematic Review of Conventional and Genetic Risk Factor Studies.

J Cardiovasc Dev Dis. 2022 Sep 5;9(9):295. doi: 10.3390/jcdd9090295.

Identifying counties at risk of high overdose mortality burden during the emerging fentanyl epidemic in the USA: a predictive statistical modelling study.

Lancet Public Health. 2021 Oct;6(10):e720-e728. doi: 10.1016/S2468-2667(21)00080-3. Epub 2021 Jun 10.

Predicting the evolution of neck pain episodes in routine clinical practice.

BMC Musculoskelet Disord. 2019 Dec 26;20(1):620. doi: 10.1186/s12891-019-2962-9.

A Simple Pre-endoscopy Score for Predicting Risk of Malignancy in Patients with Dyspepsia: A 5-Year Prospective Study.

Dig Dis Sci. 2018 Dec;63(12):3442-3447. doi: 10.1007/s10620-018-5245-7. Epub 2018 Aug 14.

Integrated powered density: Screening ultrahigh dimensional covariates with survival outcomes.

Biometrics. 2018 Jun;74(2):421-429. doi: 10.1111/biom.12820. Epub 2017 Nov 9.

本文引用的文献

A prediction model for lung cancer diagnosis that integrates genomic and clinical features.

Cancer Prev Res (Phila). 2008 Jun;1(1):56-64. doi: 10.1158/1940-6207.CAPR-08-0011. Epub 2008 Mar 31.

Gleason score as predictor of clinicopathologic findings and biochemical (PSA) progression following radical prostatectomy.

Int Braz J Urol. 2008 Jan-Feb;34(1):23-9. doi: 10.1590/s1677-55382008000100005.

Evaluation of logistic regression reporting in current obstetrics and gynecology literature.

Obstet Gynecol. 2008 Feb;111(2 Pt 1):413-9. doi: 10.1097/AOG.0b013e318160f38e.

Features at presentation predict children with acute lymphoblastic leukemia at low risk for tumor lysis syndrome.

Cancer. 2007 Oct 15;110(8):1832-9. doi: 10.1002/cncr.22990.

A predictive model for the detection of tumor lysis syndrome during AML induction therapy.

Leuk Lymphoma. 2006 May;47(5):877-83. doi: 10.1080/10428190500404662.

How well does the Gleason score predict prostate cancer death? A 20-year followup of a population based cohort in Sweden.

J Urol. 2006 Apr;175(4):1337-40. doi: 10.1016/S0022-5347(05)00734-2.

Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality.

J Clin Epidemiol. 2004 Nov;57(11):1138-46. doi: 10.1016/j.jclinepi.2004.04.003.

Integrated modeling of clinical and gene expression information for personalized prediction of disease outcomes.

Proc Natl Acad Sci U S A. 2004 Jun 1;101(22):8431-6. doi: 10.1073/pnas.0401736101. Epub 2004 May 19.

Predictive models for breast cancer susceptibility from multiple single nucleotide polymorphisms.

Clin Cancer Res. 2004 Apr 15;10(8):2725-37. doi: 10.1158/1078-0432.ccr-1115-03.

Towards integrated clinico-genomic models for personalized medicine: combining gene expression signatures and clinical factors in breast cancer outcomes prediction.

Hum Mol Genet. 2003 Oct 15;12 Spec No 2:R153-7. doi: 10.1093/hmg/ddg287. Epub 2003 Aug 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

确定在开发和验证预测模型中变量的相对重要性。

Determining relative importance of variables in developing and validating predictive models.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献