• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

伪随机数生成器对机器学习得到的平均处理效应估计的影响。

Pseudo-random Number Generator Influences on Average Treatment Effect Estimates Obtained with Machine Learning.

机构信息

From the Department of Epidemiology, Emory University.

Department of Epidemiology, University of Pittsburgh School of Public Health, Atlanta, GA.

出版信息

Epidemiology. 2024 Nov 1;35(6):779-786. doi: 10.1097/EDE.0000000000001785. Epub 2024 Aug 16.

DOI:10.1097/EDE.0000000000001785
PMID:39150879
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11560583/
Abstract

BACKGROUND

The use of machine learning to estimate exposure effects introduces a dependence between the results of an empirical study and the value of the seed used to fix the pseudo-random number generator.

METHODS

We used data from 10,038 pregnant women and a 10% subsample (N = 1004) to examine the extent to which the risk difference for the relation between fruit and vegetable consumption and preeclampsia risk changes under different seed values. We fit an augmented inverse probability weighted estimator with two Super Learner algorithms: a simple algorithm including random forests and single-layer neural networks and a more complex algorithm with a mix of tree-based, regression-based, penalized, and simple algorithms. We evaluated the distributions of risk differences, standard errors, and P values that result from 5000 different seed value selections.

RESULTS

Our findings suggest important variability in the risk difference estimates, as well as an important effect of the stacking algorithm used. The interquartile range width of the risk differences in the full sample with the simple algorithm was 13 per 1000. However, all other interquartile ranges were roughly an order of magnitude lower. The medians of the distributions of risk differences differed according to the sample size and the algorithm used.

CONCLUSIONS

Our findings add another dimension of concern regarding the potential for "p-hacking," and further warrant the need to move away from simplistic evidentiary thresholds in empirical research. When empirical results depend on pseudo-random number generator seed values, caution is warranted in interpreting these results.

摘要

背景

使用机器学习来估计暴露效应会引入实证研究结果与用于固定伪随机数生成器的种子值之间的依赖性。

方法

我们使用了 10038 名孕妇的数据和一个 10%的子样本(N=1004),以检验在不同种子值下,水果和蔬菜消费与子痫前期风险之间关系的风险差异的变化程度。我们使用了两种 Super Learner 算法拟合了增强逆概率加权估计器:一种简单算法,包括随机森林和单层神经网络,另一种更复杂的算法,包含了基于树、基于回归、惩罚和简单算法的混合。我们评估了 5000 种不同种子值选择所产生的风险差异、标准误差和 P 值的分布。

结果

我们的研究结果表明,风险差异估计值存在重要的可变性,同时也受到所使用的堆叠算法的重要影响。在使用简单算法的全样本中,风险差异的四分位间距为每 1000 个 13 个。然而,所有其他四分位间距都大约低一个数量级。根据样本量和使用的算法,风险差异分布的中位数有所不同。

结论

我们的研究结果增加了对“假阳性”的潜在可能性的另一个关注维度,进一步证明需要在实证研究中摒弃简单的证据阈值。当实证结果取决于伪随机数生成器种子值时,需要谨慎解释这些结果。

相似文献

1
Pseudo-random Number Generator Influences on Average Treatment Effect Estimates Obtained with Machine Learning.伪随机数生成器对机器学习得到的平均处理效应估计的影响。
Epidemiology. 2024 Nov 1;35(6):779-786. doi: 10.1097/EDE.0000000000001785. Epub 2024 Aug 16.
2
Machine learning as a strategy to account for dietary synergy: an illustration based on dietary intake and adverse pregnancy outcomes.机器学习作为解释膳食协同作用的策略:基于膳食摄入和不良妊娠结局的说明。
Am J Clin Nutr. 2020 Jun 1;111(6):1235-1243. doi: 10.1093/ajcn/nqaa027.
3
Use of a Doubly Robust Machine-Learning-Based Approach to Evaluate Body Mass Index as a Modifier of the Association Between Fruit and Vegetable Intake and Preeclampsia.使用基于双重稳健机器学习的方法评估体重指数作为水果和蔬菜摄入量与子痫前期关联的修饰因素。
Am J Epidemiol. 2022 Jul 23;191(8):1396-1406. doi: 10.1093/aje/kwac062.
4
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
5
Is the Association Between Fruits and Vegetables and Preeclampsia Due to Higher Dietary Vitamin C and Carotenoid Intakes?水果和蔬菜与子痫前期之间的关联是否由于更高的膳食维生素 C 和类胡萝卜素摄入量?
Am J Clin Nutr. 2023 Aug;118(2):459-467. doi: 10.1016/j.ajcnut.2023.06.007. Epub 2023 Jun 14.
6
Construction and evaluation of machine learning-based predictive models for early-onset preeclampsia.基于机器学习的早发型子痫前期预测模型的构建与评估
Pregnancy Hypertens. 2025 Mar;39:101198. doi: 10.1016/j.preghy.2025.101198. Epub 2025 Jan 30.
7
Reduced risk of pre-eclampsia with organic vegetable consumption: results from the prospective Norwegian Mother and Child Cohort Study.食用有机蔬菜可降低先兆子痫风险:挪威母婴队列前瞻性研究结果
BMJ Open. 2014 Sep 10;4(9):e006143. doi: 10.1136/bmjopen-2014-006143.
8
A dietary pattern characterized by high intake of vegetables, fruits, and vegetable oils is associated with reduced risk of preeclampsia in nulliparous pregnant Norwegian women.以大量摄入蔬菜、水果和植物油为特征的饮食模式与挪威未生育孕妇先兆子痫风险降低相关。
J Nutr. 2009 Jun;139(6):1162-8. doi: 10.3945/jn.109.104968. Epub 2009 Apr 15.
9
Prediction model of preeclampsia using machine learning based methods: a population based cohort study in China.基于机器学习方法预测子痫前期:中国基于人群的队列研究。
Front Endocrinol (Lausanne). 2024 Jun 11;15:1345573. doi: 10.3389/fendo.2024.1345573. eCollection 2024.
10
Prediction model development of late-onset preeclampsia using machine learning-based methods.基于机器学习的方法预测晚发型子痫前期的模型开发。
PLoS One. 2019 Aug 23;14(8):e0221202. doi: 10.1371/journal.pone.0221202. eCollection 2019.

引用本文的文献

1
Performance of Cross-Validated Targeted Maximum Likelihood Estimation.交叉验证的靶向最大似然估计的性能
Stat Med. 2025 Jul;44(15-17):e70185. doi: 10.1002/sim.70185.

本文引用的文献

1
Is the Association Between Fruits and Vegetables and Preeclampsia Due to Higher Dietary Vitamin C and Carotenoid Intakes?水果和蔬菜与子痫前期之间的关联是否由于更高的膳食维生素 C 和类胡萝卜素摄入量?
Am J Clin Nutr. 2023 Aug;118(2):459-467. doi: 10.1016/j.ajcnut.2023.06.007. Epub 2023 Jun 14.
2
Defining and Identifying Average Treatment Effects.定义和识别平均治疗效果。
Am J Epidemiol. 2023 May 5;192(5):685-687. doi: 10.1093/aje/kwad012.
3
Use of a Doubly Robust Machine-Learning-Based Approach to Evaluate Body Mass Index as a Modifier of the Association Between Fruit and Vegetable Intake and Preeclampsia.使用基于双重稳健机器学习的方法评估体重指数作为水果和蔬菜摄入量与子痫前期关联的修饰因素。
Am J Epidemiol. 2022 Jul 23;191(8):1396-1406. doi: 10.1093/aje/kwac062.
4
Challenges in Obtaining Valid Causal Effect Estimates with Machine Learning Algorithms.使用机器学习算法获取有效因果效应估计值面临的挑战。
Am J Epidemiol. 2023 Sep 1;192(9). doi: 10.1093/aje/kwab201. Epub 2021 Jul 15.
5
Machine Learning for Causal Inference: On the Use of Cross-fit Estimators.机器学习在因果推断中的应用:基于交叉拟合估计量的研究。
Epidemiology. 2021 May 1;32(3):393-401. doi: 10.1097/EDE.0000000000001332.
6
Analysis goals, error-cost sensitivity, and analysis hacking: Essential considerations in hypothesis testing and multiple comparisons.分析目标、误差成本敏感性与分析操纵:假设检验和多重比较中的重要考量因素
Paediatr Perinat Epidemiol. 2021 Jan;35(1):8-23. doi: 10.1111/ppe.12711. Epub 2020 Dec 2.
7
Machine learning as a strategy to account for dietary synergy: an illustration based on dietary intake and adverse pregnancy outcomes.机器学习作为解释膳食协同作用的策略:基于膳食摄入和不良妊娠结局的说明。
Am J Clin Nutr. 2020 Jun 1;111(6):1235-1243. doi: 10.1093/ajcn/nqaa027.
8
Metalearners for estimating heterogeneous treatment effects using machine learning.使用机器学习估计异质处理效应的元学习器。
Proc Natl Acad Sci U S A. 2019 Mar 5;116(10):4156-4165. doi: 10.1073/pnas.1804597116. Epub 2019 Feb 15.
9
Update of the Healthy Eating Index: HEI-2015.更新后的健康饮食指数:HEI-2015。
J Acad Nutr Diet. 2018 Sep;118(9):1591-1602. doi: 10.1016/j.jand.2018.05.021.
10
Stacked generalization: an introduction to super learning.堆叠泛化:超级学习导论。
Eur J Epidemiol. 2018 May;33(5):459-464. doi: 10.1007/s10654-018-0390-z. Epub 2018 Apr 10.