随机森林在匹配病例对照研究中的分析。

Random forests for the analysis of matched case-control studies.

机构信息

Chair of Epidemiology, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany.

Institute of Medical Biometry, Informatics and Epidemiology, Faculty of Medicine, University of Bonn, Bonn, Germany.

出版信息

BMC Bioinformatics. 2024 Aug 1;25(1):253. doi: 10.1186/s12859-024-05877-5.

DOI:10.1186/s12859-024-05877-5

PMID:39090608

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11292918/

Abstract

BACKGROUND

Conditional logistic regression trees have been proposed as a flexible alternative to the standard method of conditional logistic regression for the analysis of matched case-control studies. While they allow to avoid the strict assumption of linearity and automatically incorporate interactions, conditional logistic regression trees may suffer from a relatively high variability. Further machine learning methods for the analysis of matched case-control studies are missing because conventional machine learning methods cannot handle the matched structure of the data.

RESULTS

A random forest method for the analysis of matched case-control studies based on conditional logistic regression trees is proposed, which overcomes the issue of high variability. It provides an accurate estimation of exposure effects while being more flexible in the functional form of covariate effects. The efficacy of the method is illustrated in a simulation study and within an application to real-world data from a matched case-control study on the effect of regular participation in cervical cancer screening on the development of cervical cancer.

CONCLUSIONS

The proposed random forest method is a promising add-on to the toolbox for the analysis of matched case-control studies and addresses the need for machine-learning methods in this field. It provides a more flexible approach compared to the standard method of conditional logistic regression, but also compared to conditional logistic regression trees. It allows for non-linearity and the automatic inclusion of interaction effects and is suitable both for exploratory and explanatory analyses.

摘要

背景

条件逻辑回归树已被提议作为匹配病例对照研究分析中标准条件逻辑回归方法的一种灵活替代方法。虽然它们允许避免线性的严格假设，并自动纳入交互作用，但条件逻辑回归树可能会受到相对较高的可变性的影响。由于常规机器学习方法无法处理数据的匹配结构，因此缺少用于匹配病例对照研究分析的其他机器学习方法。

结果

提出了一种基于条件逻辑回归树的匹配病例对照研究的随机森林方法，该方法克服了高变异性的问题。它在协变量效应的函数形式上更灵活的同时，提供了暴露效应的准确估计。该方法的有效性在模拟研究和对基于匹配病例对照研究的关于定期参与宫颈癌筛查对宫颈癌发展影响的真实数据的应用中得到了说明。

结论

所提出的随机森林方法是匹配病例对照研究分析工具包的一个有前途的补充，满足了该领域对机器学习方法的需求。与条件逻辑回归标准方法相比，它提供了更灵活的方法，但与条件逻辑回归树相比也是如此。它允许非线性和自动纳入交互作用，既适用于探索性分析，也适用于解释性分析。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

随机森林在匹配病例对照研究中的分析。

Random forests for the analysis of matched case-control studies.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

随机森林在匹配病例对照研究中的分析。

Random forests for the analysis of matched case-control studies.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献