Suppr超能文献

一项利用逻辑回归和分类与回归树识别新型烟草制品使用中高危人群的应用:启发式比较

An application in identifying high-risk populations in alternative tobacco product use utilizing logistic regression and CART: a heuristic comparison.

作者信息

Lei Yang, Nollen Nikki, Ahluwahlia Jasjit S, Yu Qing, Mayo Matthew S

机构信息

Department of Biostatistics, The University of Kansas Medical Center, Kansas City, KS, USA.

Department of Preventive Medicine and Public Health, The University of Kansas Medical Center, Kansas City, KS, USA.

出版信息

BMC Public Health. 2015 Apr 9;15:341. doi: 10.1186/s12889-015-1582-z.

Abstract

BACKGROUND

Other forms of tobacco use are increasing in prevalence, yet most tobacco control efforts are aimed at cigarettes. In light of this, it is important to identify individuals who are using both cigarettes and alternative tobacco products (ATPs). Most previous studies have used regression models. We conducted a traditional logistic regression model and a classification and regression tree (CART) model to illustrate and discuss the added advantages of using CART in the setting of identifying high-risk subgroups of ATP users among cigarettes smokers.

METHODS

The data were collected from an online cross-sectional survey administered by Survey Sampling International between July 5, 2012 and August 15, 2012. Eligible participants self-identified as current smokers, African American, White, or Latino (of any race), were English-speaking, and were at least 25 years old. The study sample included 2,376 participants and was divided into independent training and validation samples for a hold out validation. Logistic regression and CART models were used to examine the important predictors of cigarettes + ATP users.

RESULTS

The logistic regression model identified nine important factors: gender, age, race, nicotine dependence, buying cigarettes or borrowing, whether the price of cigarettes influences the brand purchased, whether the participants set limits on cigarettes per day, alcohol use scores, and discrimination frequencies. The C-index of the logistic regression model was 0.74, indicating good discriminatory capability. The model performed well in the validation cohort also with good discrimination (c-index = 0.73) and excellent calibration (R-square = 0.96 in the calibration regression). The parsimonious CART model identified gender, age, alcohol use score, race, and discrimination frequencies to be the most important factors. It also revealed interesting partial interactions. The c-index is 0.70 for the training sample and 0.69 for the validation sample. The misclassification rate was 0.342 for the training sample and 0.346 for the validation sample. The CART model was easier to interpret and discovered target populations that possess clinical significance.

CONCLUSION

This study suggests that the non-parametric CART model is parsimonious, potentially easier to interpret, and provides additional information in identifying the subgroups at high risk of ATP use among cigarette smokers.

摘要

背景

其他形式的烟草使用在流行率上呈上升趋势,但大多数烟草控制措施都针对香烟。鉴于此,识别同时使用香烟和替代烟草产品(ATP)的个体很重要。此前大多数研究都使用回归模型。我们进行了传统逻辑回归模型和分类回归树(CART)模型分析,以阐述和讨论在识别吸烟者中ATP使用者的高风险亚组时使用CART模型的额外优势。

方法

数据收集自2012年7月5日至2012年8月15日由国际调查抽样公司开展的一项在线横断面调查。符合条件的参与者自我认定为当前吸烟者、非裔美国人、白人或拉丁裔(任何种族),说英语,且年龄至少25岁。研究样本包括2376名参与者,并分为独立的训练样本和验证样本用于留存验证。使用逻辑回归和CART模型来检验香烟 + ATP使用者的重要预测因素。

结果

逻辑回归模型识别出九个重要因素:性别、年龄、种族、尼古丁依赖、购买香烟或借烟情况、香烟价格是否影响所购买品牌、参与者是否设定每日吸烟量限制、饮酒得分以及歧视频率。逻辑回归模型的C指数为0.74,表明具有良好的区分能力。该模型在验证队列中表现良好,区分能力良好(C指数 = 0.73)且校准出色(校准回归中的R平方 = 0.9)。简约CART模型识别出性别、年龄、饮酒得分、种族和歧视频率是最重要的因素。它还揭示了有趣的部分相互作用。训练样本的C指数为0.70,验证样本的C指数为0.69。训练样本的误分类率为0.342,验证样本的误分类率为0.346。CART模型更易于解释,并发现了具有临床意义的目标人群。

结论

本研究表明,非参数CART模型简约,可能更易于解释,并在识别吸烟者中使用ATP的高风险亚组时提供了额外信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1383/4415362/876d02a36b96/12889_2015_1582_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验