• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在大型流行病学数据集中检测独立关联:随机森林、增强回归树、传统和惩罚逻辑回归在识别与2009年甲型H1N1流感感染相关的独立因素方面的比较。

Detection of independent associations in a large epidemiologic dataset: a comparison of random forests, boosted regression trees, conventional and penalized logistic regression for identifying independent factors associated with H1N1pdm influenza infections.

作者信息

Mansiaux Yohann, Carrat Fabrice

机构信息

INSERM, UMR_S 1136, Institut Pierre Louis d'Epidémiologie et de Santé Publique, F-75013 Paris, France.

出版信息

BMC Med Res Methodol. 2014 Aug 26;14:99. doi: 10.1186/1471-2288-14-99.

DOI:10.1186/1471-2288-14-99
PMID:25154404
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4146451/
Abstract

BACKGROUND

Big data is steadily growing in epidemiology. We explored the performances of methods dedicated to big data analysis for detecting independent associations between exposures and a health outcome.

METHODS

We searched for associations between 303 covariates and influenza infection in 498 subjects (14% infected) sampled from a dedicated cohort. Independent associations were detected using two data mining methods, the Random Forests (RF) and the Boosted Regression Trees (BRT); the conventional logistic regression framework (Univariate Followed by Multivariate Logistic Regression - UFMLR) and the Least Absolute Shrinkage and Selection Operator (LASSO) with penalty in multivariate logistic regression to achieve a sparse selection of covariates. We developed permutations tests to assess the statistical significance of associations. We simulated 500 similar sized datasets to estimate the True (TPR) and False (FPR) Positive Rates associated with these methods.

RESULTS

Between 3 and 24 covariates (1%-8%) were identified as associated with influenza infection depending on the method. The pre-seasonal haemagglutination inhibition antibody titer was the unique covariate selected with all methods while 266 (87%) covariates were not selected by any method. At 5% nominal significance level, the TPR were 85% with RF, 80% with BRT, 26% to 49% with UFMLR, 71% to 78% with LASSO. Conversely, the FPR were 4% with RF and BRT, 9% to 2% with UFMLR, and 9% to 4% with LASSO.

CONCLUSIONS

Data mining methods and LASSO should be considered as valuable methods to detect independent associations in large epidemiologic datasets.

摘要

背景

大数据在流行病学领域正稳步增长。我们探讨了用于大数据分析的方法在检测暴露因素与健康结局之间独立关联方面的性能。

方法

我们在一个专门队列中抽取的498名受试者(14%感染)中,寻找303个协变量与流感感染之间的关联。使用两种数据挖掘方法,即随机森林(RF)和增强回归树(BRT)来检测独立关联;采用传统的逻辑回归框架(单变量后接多变量逻辑回归 - UFMLR)以及在多变量逻辑回归中带有惩罚项的最小绝对收缩和选择算子(LASSO),以实现协变量的稀疏选择。我们开发了置换检验来评估关联的统计学显著性。我们模拟了500个类似规模的数据集,以估计与这些方法相关的真阳性率(TPR)和假阳性率(FPR)。

结果

根据所使用的方法,有3至24个协变量(1% - 8%)被确定与流感感染相关。季节性前血凝抑制抗体滴度是所有方法都选择的唯一协变量,而266个(87%)协变量未被任何方法选中。在名义显著性水平为5%时,RF的TPR为85%,BRT为80%,UFMLR为26%至49%,LASSO为71%至78%。相反,RF和BRT的FPR为4%,UFMLR为9%至2%,LASSO为9%至4%。

结论

数据挖掘方法和LASSO应被视为在大型流行病学数据集中检测独立关联的有价值方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e809/4146451/77877ca5e765/1471-2288-14-99-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e809/4146451/a9bc651abec4/1471-2288-14-99-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e809/4146451/77877ca5e765/1471-2288-14-99-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e809/4146451/a9bc651abec4/1471-2288-14-99-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e809/4146451/77877ca5e765/1471-2288-14-99-2.jpg

相似文献

1
Detection of independent associations in a large epidemiologic dataset: a comparison of random forests, boosted regression trees, conventional and penalized logistic regression for identifying independent factors associated with H1N1pdm influenza infections.在大型流行病学数据集中检测独立关联:随机森林、增强回归树、传统和惩罚逻辑回归在识别与2009年甲型H1N1流感感染相关的独立因素方面的比较。
BMC Med Res Methodol. 2014 Aug 26;14:99. doi: 10.1186/1471-2288-14-99.
2
Antibody persistence and serological protection among seasonal 2007 influenza A(H1N1) infected subjects: Results from the FLUREC cohort study.季节性 2007 年甲型 H1N1 流感感染人群中的抗体持久性和血清学保护:来自 FLUREC 队列研究的结果。
Vaccine. 2015 Dec 8;33(49):7015-21. doi: 10.1016/j.vaccine.2015.09.016. Epub 2015 Sep 19.
3
Seasonal H1N1 2007 influenza virus infection is associated with elevated pre-exposure antibody titers to the 2009 pandemic influenza A (H1N1) virus.季节性 H1N1 2007 流感病毒感染与对 2009 年大流行的甲型流感(H1N1)病毒的预先存在抗体滴度升高有关。
Clin Microbiol Infect. 2011 May;17(5):732-7. doi: 10.1111/j.1469-0691.2010.03352.x. Epub 2010 Oct 29.
4
Asymptomatic ratio for seasonal H1N1 influenza infection among schoolchildren in Taiwan.台湾学童中季节性H1N1流感感染的无症状比率。
BMC Infect Dis. 2014 Feb 12;14:80. doi: 10.1186/1471-2334-14-80.
5
H1N1 hemagglutinin-inhibition seroprevalence in Emergency Department Health Care workers after the first wave of the 2009 influenza pandemic.2009年流感大流行第一波过后急诊科医护人员中H1N1血凝素抑制血清阳性率
Pediatr Emerg Care. 2011 Sep;27(9):804-7. doi: 10.1097/PEC.0b013e31822c125e.
6
Evaluation of Preexisting Anti-Hemagglutinin Stalk Antibody as a Correlate of Protection in a Healthy Volunteer Challenge with Influenza A/H1N1pdm Virus.评估健康志愿者挑战甲型流感病毒 A/H1N1pdm 时预先存在的抗血凝素茎部抗体作为保护相关性。
mBio. 2018 Jan 23;9(1):e02284-17. doi: 10.1128/mBio.02284-17.
7
Improved serological response to H1N1 monovalent vaccine associated with viral suppression among HIV-1-infected patients during the 2009 influenza (H1N1) pandemic in the Southern Hemisphere.在南半球 2009 年流感(H1N1)大流行期间,HIV-1 感染者的病毒抑制与 H1N1 单价疫苗血清学应答改善相关。
HIV Med. 2012 Jul;13(6):352-7. doi: 10.1111/j.1468-1293.2011.00987.x. Epub 2012 Feb 2.
8
Assessment of baseline age-specific antibody prevalence and incidence of infection to novel influenza A/H1N1 2009.评估新型甲型 H1N1 流感病毒在不同年龄段的基线抗体流行率和感染发病率。
Health Technol Assess. 2010 Dec;14(55):115-92. doi: 10.3310/hta14550-03.
9
Kinetics, Longevity, and Cross-Reactivity of Antineuraminidase Antibody after Natural Infection with Influenza A Viruses.甲型流感病毒自然感染后抗神经氨酸酶抗体的动力学、寿命及交叉反应性
Clin Vaccine Immunol. 2017 Dec 5;24(12). doi: 10.1128/CVI.00248-17. Print 2017 Dec.
10
A systematic comparison of statistical methods to detect interactions in exposome-health associations.用于检测暴露组-健康关联中相互作用的统计方法的系统比较。
Environ Health. 2017 Jul 14;16(1):74. doi: 10.1186/s12940-017-0277-6.

引用本文的文献

1
The dark side of personality functioning: associations between antisocial cognitions, personality functioning (AMPD), empathy and mentalisation.人格功能的阴暗面:反社会认知、人格功能(AMPD)、同理心与心理化之间的关联
Front Psychiatry. 2024 May 28;15:1377177. doi: 10.3389/fpsyt.2024.1377177. eCollection 2024.
2
Machine Learning-Derived Baseline Visual Field Patterns Predict Future Glaucoma Onset in the Ocular Hypertension Treatment Study.机器学习衍生的基线视野模式可预测青光眼在高眼压治疗研究中的发病。
Invest Ophthalmol Vis Sci. 2024 Feb 1;65(2):35. doi: 10.1167/iovs.65.2.35.
3
Work Task Association with Lead Urine and Blood Concentrations in Informal Electronic Waste Recyclers in Thailand and Chile.

本文引用的文献

1
Estimation of the association between antibody titers and protection against confirmed influenza virus infection in children.评估抗体效价与儿童确诊流感病毒感染防护之间的关联。
J Infect Dis. 2013 Oct 15;208(8):1320-4. doi: 10.1093/infdis/jit372. Epub 2013 Aug 1.
2
Factors associated with post-seasonal serological titer and risk factors for infection with the pandemic A/H1N1 virus in the French general population.与法国普通人群季节性后血清学滴度及大流行性 A/H1N1 病毒感染风险因素相关的因素。
PLoS One. 2013 Apr 16;8(4):e60127. doi: 10.1371/journal.pone.0060127. Print 2013.
3
The inevitable application of big data to health care.
工作任务与泰国和智利非正式电子废物回收者尿铅和血铅浓度的关系。
Int J Environ Res Public Health. 2021 Oct 9;18(20):10580. doi: 10.3390/ijerph182010580.
4
Leveraging artificial intelligence for pandemic preparedness and response: a scoping review to identify key use cases.利用人工智能进行大流行防范与应对:一项确定关键用例的范围审查。
NPJ Digit Med. 2021 Jun 10;4(1):96. doi: 10.1038/s41746-021-00459-8.
5
Identifying the predictors of Covid-19 infection outcomes and development of prediction models.识别新冠病毒感染结局的预测因素和预测模型的开发。
J Infect Public Health. 2021 Jun;14(6):751-756. doi: 10.1016/j.jiph.2021.03.006. Epub 2021 Mar 18.
6
Data Mining in Healthcare: Applying Strategic Intelligence Techniques to Depict 25 Years of Research Development.医疗保健中的数据挖掘:应用战略情报技术描绘 25 年的研究发展。
Int J Environ Res Public Health. 2021 Mar 17;18(6):3099. doi: 10.3390/ijerph18063099.
7
Identifying correlates of Guinea worm (Dracunculus medinensis) infection in domestic dog populations.鉴定家犬群体中麦地那龙线虫(Dracunculus medinensis)感染的相关因素。
PLoS Negl Trop Dis. 2020 Sep 14;14(9):e0008620. doi: 10.1371/journal.pntd.0008620. eCollection 2020 Sep.
8
Comparison of predictive models for hepatitis C co-infection among HIV patients in Cambodia.柬埔寨 HIV 患者丙型肝炎合并感染预测模型的比较。
BMC Infect Dis. 2020 Mar 12;20(1):209. doi: 10.1186/s12879-020-4909-z.
9
Influenza activity prediction using meteorological factors in a warm temperate to subtropical transitional zone, Eastern China.利用气象因素预测中国东部暖温带至亚热带过渡带的流感活动。
Epidemiol Infect. 2019 Dec 20;147:e325. doi: 10.1017/S0950268819002140.
10
A Data Mining Approach Identified Salivary Biomarkers That Discriminate between Two Obesity Measures.一种数据挖掘方法识别出了区分两种肥胖测量指标的唾液生物标志物。
J Obes. 2019 May 19;2019:9570218. doi: 10.1155/2019/9570218. eCollection 2019.
大数据在医疗保健领域的必然应用。
JAMA. 2013 Apr 3;309(13):1351-2. doi: 10.1001/jama.2013.393.
4
Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes.利用数据挖掘和机器学习文献中的方法进行疾病分类和预测:以心力衰竭亚型分类为例的研究
J Clin Epidemiol. 2013 Apr;66(4):398-407. doi: 10.1016/j.jclinepi.2012.11.008. Epub 2013 Feb 4.
5
Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle?生命科学中的随机森林数据挖掘:是漫步公园还是迷失丛林?
Brief Bioinform. 2013 May;14(3):315-26. doi: 10.1093/bib/bbs034. Epub 2012 Jul 10.
6
Regression trees for predicting mortality in patients with cardiovascular disease: what improvement is achieved by using ensemble-based methods?用于预测心血管疾病患者死亡率的回归树:使用基于集成的方法能带来哪些改善?
Biom J. 2012 Sep;54(5):657-73. doi: 10.1002/bimj.201100251. Epub 2012 Jul 6.
7
Prescription-drug-related risk in driving: comparing conventional and lasso shrinkage logistic regressions.驾驶相关处方药风险:传统逻辑回归与套索收缩逻辑回归的比较。
Epidemiology. 2012 Sep;23(5):706-12. doi: 10.1097/EDE.0b013e31825fa528.
8
The coming age of data-driven medicine: translational bioinformatics' next frontier.数据驱动医学的新时代:转化生物信息学的下一个前沿领域。
J Am Med Inform Assoc. 2012 Jun;19(e1):e2-4. doi: 10.1136/amiajnl-2012-000969.
9
Integrative study of pandemic A/H1N1 influenza infections: design and methods of the CoPanFlu-France cohort.大流行性 A/H1N1 流感感染的综合研究:CoPanFlu-France 队列的设计和方法。
BMC Public Health. 2012 Jun 7;12:417. doi: 10.1186/1471-2458-12-417.
10
Translational research in infectious disease: current paradigms and challenges ahead.传染病转化研究:当前范式与未来挑战。
Transl Res. 2012 Jun;159(6):430-53. doi: 10.1016/j.trsl.2011.12.009. Epub 2012 Jan 15.