• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

别让你的分析功亏一篑:随机种子对基于机器学习的因果推断的影响。

Don't Let Your Analysis Go to Seed: On the Impact of Random Seed on Machine Learning-based Causal Inference.

机构信息

From the Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA.

Department of Medicine, Division of Infectious Disease, Emory University School of Medicine, Atlanta, GA.

出版信息

Epidemiology. 2024 Nov 1;35(6):764-778. doi: 10.1097/EDE.0000000000001782. Epub 2024 Aug 16.

DOI:10.1097/EDE.0000000000001782
PMID:39150861
Abstract

Machine learning techniques for causal effect estimation can enhance the reliability of epidemiologic analyses, reducing their dependence on correct model specifications. However, the stochastic nature of many machine learning algorithms implies that the results derived from such approaches may be influenced by the random seed that is set before model fitting. In this work, we highlight the substantial influence of random seeds on a popular approach for machine learning-based causal effect estimation, namely doubly robust estimators. We illustrate that varying seeds can yield divergent scientific interpretations of doubly robust estimates produced from the same dataset. We propose techniques for stabilizing results across random seeds and, through an extensive simulation study, demonstrate that these techniques effectively neutralize seed-related variability without compromising the statistical efficiency of the estimators. Based on these findings, we offer practical guidelines to minimize the influence of random seeds in real-world applications, and we encourage researchers to explore the variability due to random seeds when implementing any method that involves random steps.

摘要

机器学习技术可用于因果效应估计,从而提高流行病学分析的可靠性,减少对正确模型规范的依赖。然而,许多机器学习算法的随机性意味着,从这些方法得出的结果可能会受到模型拟合前设置的随机种子的影响。在这项工作中,我们强调了随机种子对基于机器学习的因果效应估计的一种流行方法——双重稳健估计器的重大影响。我们表明,不同的种子会导致从同一数据集产生的双重稳健估计的科学解释产生分歧。我们提出了在随机种子之间稳定结果的技术,并通过广泛的模拟研究证明,这些技术可以有效地消除与种子相关的变异性,而不会影响估计器的统计效率。基于这些发现,我们提供了一些实用的指南,以最小化随机种子在实际应用中的影响,并鼓励研究人员在实施任何涉及随机步骤的方法时,探索由于随机种子引起的变异性。

相似文献

1
Don't Let Your Analysis Go to Seed: On the Impact of Random Seed on Machine Learning-based Causal Inference.别让你的分析功亏一篑:随机种子对基于机器学习的因果推断的影响。
Epidemiology. 2024 Nov 1;35(6):764-778. doi: 10.1097/EDE.0000000000001782. Epub 2024 Aug 16.
2
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
3
Approaches for predicting dairy cattle methane emissions: from traditional methods to machine learning.预测奶牛甲烷排放的方法:从传统方法到机器学习。
J Anim Sci. 2024 Jan 3;102. doi: 10.1093/jas/skae219.
4
The Lived Experience of Autistic Adults in Employment: A Systematic Search and Synthesis.成年自闭症患者的就业生活经历:系统检索与综述
Autism Adulthood. 2024 Dec 2;6(4):495-509. doi: 10.1089/aut.2022.0114. eCollection 2024 Dec.
5
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
6
Sexual Harassment and Prevention Training性骚扰与预防培训
7
Short-Term Memory Impairment短期记忆障碍
8
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.
9
Artificial intelligence for diagnosing exudative age-related macular degeneration.人工智能在渗出性年龄相关性黄斑变性诊断中的应用。
Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.
10
Machine learning outcome regression improves doubly robust estimation of average causal effects.机器学习结果回归改进了平均因果效应的双重稳健估计。
Pharmacoepidemiol Drug Saf. 2020 Sep;29(9):1120-1133. doi: 10.1002/pds.5074. Epub 2020 Jul 27.

引用本文的文献

1
Performance of Cross-Validated Targeted Maximum Likelihood Estimation.交叉验证的靶向最大似然估计的性能
Stat Med. 2025 Jul;44(15-17):e70185. doi: 10.1002/sim.70185.
2
Personalized azithromycin treatment rules for children with watery diarrhea using machine learning.使用机器学习制定儿童水样腹泻的个性化阿奇霉素治疗规则。
Nat Commun. 2025 Jul 1;16(1):5968. doi: 10.1038/s41467-025-60682-9.
3
Machine learning for estimating and comparing clinical rules for treating diarrheal illness with antibiotics.用于估计和比较使用抗生素治疗腹泻病的临床规则的机器学习。

本文引用的文献

1
Demystifying Statistical Inference When Using Machine Learning in Causal Research.在因果研究中使用机器学习时揭开统计推断的神秘面纱。
Am J Epidemiol. 2021 Jul 15;192(9):1545-9. doi: 10.1093/aje/kwab200.
2
Reflection on modern methods: good practices for applied statistical learning in epidemiology.反思现代方法:流行病学中应用统计学习的良好实践。
Int J Epidemiol. 2021 May 17;50(2):685-693. doi: 10.1093/ije/dyaa259.
3
Machine Learning for Causal Inference: On the Use of Cross-fit Estimators.机器学习在因果推断中的应用:基于交叉拟合估计量的研究。
medRxiv. 2025 Jan 12:2025.01.10.25320357. doi: 10.1101/2025.01.10.25320357.
Epidemiology. 2021 May 1;32(3):393-401. doi: 10.1097/EDE.0000000000001332.
4
Challenges to the Reproducibility of Machine Learning Models in Health Care.医疗保健领域机器学习模型可重复性面临的挑战。
JAMA. 2020 Jan 28;323(4):305-306. doi: 10.1001/jama.2019.20866.
5
Machine learning in the estimation of causal effects: targeted minimum loss-based estimation and double/debiased machine learning.机器学习在因果效应估计中的应用:基于有向最小损失的估计和双重/无偏机器学习。
Biostatistics. 2020 Apr 1;21(2):353-358. doi: 10.1093/biostatistics/kxz042.
6
Clinical Outcomes Among Patients With Drug-resistant Tuberculosis Receiving Bedaquiline- or Delamanid-Containing Regimens.接受贝达喹啉或德拉马尼包含方案治疗的耐药结核病患者的临床结局。
Clin Infect Dis. 2020 Dec 3;71(9):2336-2344. doi: 10.1093/cid/ciz1107.
7
Doubly robust nonparametric inference on the average treatment effect.关于平均治疗效果的双重稳健非参数推断。
Biometrika. 2017 Dec;104(4):863-880. doi: 10.1093/biomet/asx053. Epub 2017 Oct 16.
8
The Highly Adaptive Lasso Estimator.高度自适应套索估计器
Proc Int Conf Data Sci Adv Anal. 2016;2016:689-696. doi: 10.1109/DSAA.2016.93. Epub 2016 Dec 26.
9
Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies.观察性研究中因果推断的靶向最大似然估计
Am J Epidemiol. 2017 Jan 1;185(1):65-73. doi: 10.1093/aje/kww165. Epub 2016 Dec 9.
10
Diagnosing and responding to violations in the positivity assumption.诊断和应对阳性假设违规行为。
Stat Methods Med Res. 2012 Feb;21(1):31-54. doi: 10.1177/0962280210386207. Epub 2010 Oct 28.