对与自然对数比值比（lnOR）呈U型关系的多个连续预测变量进行离散化：在临床和流行病学研究中引入递归梯度扫描法

Discretizing multiple continuous predictors with U-shaped relationships with lnOR: introducing the recursive gradient scanning method in clinical and epidemiological research.

作者信息

Yang Shuo, Su Huaan, Zhang Nanxiang, Han Yuduan, Ge Yingfeng, Fei Yi, Liu Ying, Hilowle Abdullahi, Xu Peng, Zhang Jinxin

机构信息

Department of Medical Statistics, School of Public Health, Sun Yat-Sen University, Guangzhou, 510080, China.

The People's Hospital of Jiangmen, No. 172 Gaodi Li, Pengjiang District, Jiangmen, Guangdong, 529000, China.

出版信息

BMC Med Res Methodol. 2025 Mar 12;25(1):70. doi: 10.1186/s12874-025-02522-4.

DOI:10.1186/s12874-025-02522-4

PMID:40075286

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11900475/

Abstract

BACKGROUND

Assuming a linear relationship between continuous predictors and outcomes in clinical prediction models is often inappropriate, as true linear relationships are rare, potentially resulting in biased estimates and inaccurate conclusions. Our research group addressed a single U-shaped independent variable before. Multiple U-shaped predictors can improve predictive accuracy by capturing nuanced relationships, but they also introduce challenges like increased complexity and potential overfitting. This study aims to extend the applicability of our previous research results to more common scenarios, thereby facilitating more comprehensive and practical investigations.

METHODS

In this study, we proposed a novel approach called the Recursive Gradient Scanning Method (RGS) for discretizing multiple continuous variables that exhibit U-shaped relationships with the natural logarithm of the odds ratio (lnOR). The RGS method involves a two-step approach: first, it conducts fine screening from the 2.5th to 97.5th percentiles of the lnOR. Then, it utilizes an iterative process that compares AIC metrics to identify optimal categorical variables. We conducted a Monte Carlo simulation study to investigate the performance of the RGS method. Different correlation levels, sample sizes, missing rates, and symmetry levels of U-shaped relationships were considered in the simulation process. To compare the RGS method with other common approaches (such as median, Q-Q, minimum P-value method), we assessed both the predictive ability (e.g., AUC) and goodness of fit (e.g., AIC) of logistic regression models with variables discretized at different cut-points using a real dataset.

RESULTS

Both simulation and empirical studies have consistently demonstrated the effectiveness of the RGS method. In simulation studies, the RGS method showed superior performance compared to other common discretization methods in discrimination ability and overall performance for logistic regression models across various U-shaped scenarios (with varying correlation levels, sample sizes, missing rates, and symmetry levels of U-shaped relationships). Similarly, empirical study showed that the optimal cut-points identified by RGS have superior clinical predictive power, as measured by metrics such as AUC, compared to other traditional methods.

CONCLUSIONS

The simulation and empirical study demonstrated that the RGS method outperformed other common discretization methods in terms of goodness of fit and predictive ability. However, in the future, we will focus on addressing challenges related to separation or missing binary responses, and we will require more data to validate our method.

摘要

背景

在临床预测模型中，假设连续预测变量与结果之间存在线性关系通常是不合适的，因为真正的线性关系很少见，这可能会导致估计偏差和结论不准确。我们的研究小组之前处理过单个U形自变量。多个U形预测变量可以通过捕捉细微的关系来提高预测准确性，但它们也带来了一些挑战，如复杂性增加和潜在的过拟合。本研究旨在将我们之前的研究结果的适用性扩展到更常见的场景，从而促进更全面和实际的研究。

方法

在本研究中，我们提出了一种名为递归梯度扫描法（RGS）的新方法，用于离散化与比值比自然对数（lnOR）呈U形关系的多个连续变量。RGS方法包括两步：首先，它从lnOR的第2.5百分位数到第97.5百分位数进行精细筛选。然后，它利用一个迭代过程，通过比较AIC指标来识别最优分类变量。我们进行了一项蒙特卡洛模拟研究，以调查RGS方法的性能。在模拟过程中考虑了不同的相关水平、样本量、缺失率和U形关系的对称水平。为了将RGS方法与其他常见方法（如中位数法、Q-Q法、最小P值法）进行比较，我们使用一个真实数据集评估了在不同切点处离散化变量的逻辑回归模型的预测能力（如AUC）和拟合优度（如AIC）。

结果

模拟和实证研究都一致证明了RGS方法的有效性。在模拟研究中，在各种U形场景（具有不同的相关水平、样本量、缺失率和U形关系的对称水平）下，RGS方法在逻辑回归模型的区分能力和整体性能方面表现优于其他常见的离散化方法。同样，实证研究表明，与其他传统方法相比，RGS确定的最优切点具有更高的临床预测能力，以AUC等指标衡量。

结论

模拟和实证研究表明，RGS方法在拟合优度和预测能力方面优于其他常见的离散化方法。然而，未来我们将专注于解决与二元反应分离或缺失相关的挑战，并且我们将需要更多数据来验证我们的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/826e/11900475/3529e0a7fb70/12874_2025_2522_Fig1_HTML.jpg

相似文献

Discretizing multiple continuous predictors with U-shaped relationships with lnOR: introducing the recursive gradient scanning method in clinical and epidemiological research.对与自然对数比值比（lnOR）呈U型关系的多个连续预测变量进行离散化：在临床和流行病学研究中引入递归梯度扫描法

BMC Med Res Methodol. 2025 Mar 12;25(1):70. doi: 10.1186/s12874-025-02522-4.

A novel approach to determine two optimal cut-points of a continuous predictor with a U-shaped relationship to hazard ratio in survival data: simulation and application.一种新方法，用于确定与生存数据中危害比呈 U 形关系的连续预测因子的两个最佳切点：模拟和应用。

BMC Med Res Methodol. 2019 May 9;19(1):96. doi: 10.1186/s12874-019-0738-4.

Examining the U-shaped relationship of sleep duration and systolic blood pressure with risk of cardiovascular events using a novel recursive gradient scanning model.使用一种新型递归梯度扫描模型研究睡眠时间和收缩压与心血管事件风险之间的U型关系。

Front Cardiovasc Med. 2023 Sep 14;10:1210171. doi: 10.3389/fcvm.2023.1210171. eCollection 2023.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

The performance of prognostic models depended on the choice of missing value imputation algorithm: a simulation study.预后模型的性能取决于缺失值插补算法的选择：一项模拟研究。

J Clin Epidemiol. 2024 Dec;176:111539. doi: 10.1016/j.jclinepi.2024.111539. Epub 2024 Sep 24.

Comprehensive implementations of multiple imputation using retrieved dropouts for continuous endpoints.使用检索到的失访数据对连续终点进行多重填补的综合实施方法。

BMC Med Res Methodol. 2025 Feb 21;25(1):47. doi: 10.1186/s12874-025-02494-5.

Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents.第1部分. 多种空气污染成分影响的统计学习方法

Res Rep Health Eff Inst. 2015 Jun(183 Pt 1-2):5-50.

Logistic regression vs. predictive mean matching for imputing binary covariates.Logistic 回归与预测均值匹配在二进制协变量插补中的比较。

Stat Methods Med Res. 2023 Nov;32(11):2172-2183. doi: 10.1177/09622802231198795. Epub 2023 Sep 26.

Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study.考虑由于非随机缺失结局数据导致的偏倚：两种概率性偏倚分析方法的比较和说明：一项模拟研究。

BMC Med Res Methodol. 2024 Nov 13;24(1):278. doi: 10.1186/s12874-024-02382-4.

Sample Size Requirements for Popular Classification Algorithms in Tabular Clinical Data: Empirical Study.表格临床数据中常用分类算法的样本量要求：实证研究

J Med Internet Res. 2024 Dec 17;26:e60231. doi: 10.2196/60231.

本文引用的文献

Front Cardiovasc Med. 2023 Sep 14;10:1210171. doi: 10.3389/fcvm.2023.1210171. eCollection 2023.

Poor handling of continuous predictors in clinical prediction models using logistic regression: a systematic review.使用逻辑回归处理临床预测模型中连续预测因子的不当方法：系统综述。

J Clin Epidemiol. 2023 Sep;161:140-151. doi: 10.1016/j.jclinepi.2023.07.017. Epub 2023 Aug 2.

Assessment of nonlinear dose-response relationships via nonparametric regression.通过非参数回归评估非线性剂量-反应关系。

J Biopharm Stat. 2024 Jan 2;34(1):136-145. doi: 10.1080/10543406.2023.2183505. Epub 2023 Mar 2.

Prognostic significance of pretreatment red blood cell distribution width in primary diffuse large B-cell lymphoma of the central nervous system for 3P medical approaches in multiple cohorts.预处理时红细胞分布宽度在中枢神经系统原发性弥漫性大B细胞淋巴瘤中对多个队列3P医学方法的预后意义。

EPMA J. 2022 Jul 15;13(3):499-517. doi: 10.1007/s13167-022-00290-5. eCollection 2022 Sep.

Gay Neighborhoods: Can They Be Identified in a Systematic Way Using Latent Class Analysis?同性恋社区：能否通过潜在类别分析以系统的方式识别？

Arch Sex Behav. 2022 Oct;51(7):3395-3401. doi: 10.1007/s10508-022-02369-6. Epub 2022 Aug 4.

Benchmark dose approach in investigating the relationship between blood metal levels and reproductive hormones: Data set from human study.基于人群研究探讨血金属水平与生殖激素关系的基准剂量法：数据集。

Environ Int. 2022 Jul;165:107313. doi: 10.1016/j.envint.2022.107313. Epub 2022 May 21.

Relationship between physical activity and mental health in a national representative cross-section study: Its variations according to obesity and comorbidity.一项全国代表性横断面研究中体力活动与心理健康之间的关系：其根据肥胖和合并症的变化。

J Affect Disord. 2022 Jul 1;308:484-493. doi: 10.1016/j.jad.2022.04.037. Epub 2022 Apr 16.

Association between the baseline tumor size and outcomes of patients with non-small cell lung cancer treated with first-line immune checkpoint inhibitor monotherapy or in combination with chemotherapy.一线免疫检查点抑制剂单药治疗或联合化疗的非小细胞肺癌患者基线肿瘤大小与预后的关联

Transl Lung Cancer Res. 2022 Feb;11(2):135-149. doi: 10.21037/tlcr-21-815.

Inverse Association Between Variety of Proteins With Appropriate Quantity From Different Food Sources and New-Onset Hypertension.来自不同食物来源的适量多种蛋白质与新发高血压之间的负相关。

Hypertension. 2022 May;79(5):1017-1027. doi: 10.1161/HYPERTENSIONAHA.121.18222. Epub 2022 Mar 10.

U-Shaped Associations Between Body Weight Changes and Major Cardiovascular Events in Type 2 Diabetes Mellitus: A Longitudinal Follow-up Study of a Nationwide Cohort of Over 1.5 Million.2型糖尿病患者体重变化与主要心血管事件之间的U型关联：一项对超过150万全国队列的纵向随访研究

Diabetes Care. 2022 May 1;45(5):1239-1246. doi: 10.2337/dc21-2299.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

对与自然对数比值比（lnOR）呈U型关系的多个连续预测变量进行离散化：在临床和流行病学研究中引入递归梯度扫描法

Discretizing multiple continuous predictors with U-shaped relationships with lnOR: introducing the recursive gradient scanning method in clinical and epidemiological research.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献