• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

因果模型预测性能评估中的样本选择偏差

Sample Selection Bias in Evaluation of Prediction Performance of Causal Models.

作者信息

Long James P, Ha Min Jin

机构信息

Department of Biostatistics, University of Texas MD Anderson Cancer Center, Texas, USA.

出版信息

Stat Anal Data Min. 2022 Feb;15(1):5-14. doi: 10.1002/sam.11559. Epub 2021 Oct 20.

DOI:10.1002/sam.11559
PMID:35498876
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9053600/
Abstract

Causal models are notoriously difficult to validate because they make untestable assumptions regarding confounding. New scientific experiments offer the possibility of evaluating causal models using prediction performance. Prediction performance measures are typically robust to violations in causal assumptions. However prediction performance does depend on the selection of training and test sets. In particular biased training sets can lead to optimistic assessments of model performance. In this work, we revisit the prediction performance of several recently proposed causal models tested on a genetic perturbation data set of Kemmeren [5]. We find that sample selection bias is likely a key driver of model performance. We propose using a less-biased evaluation set for assessing prediction performance and compare models on this new set. In this setting, the causal models have similar or worse performance compared to standard association based estimators such as Lasso. Finally we compare the performance of causal estimators in simulation studies which reproduce the Kemmeren structure of genetic knockout experiments but without any sample selection bias. These results provide an improved understanding of the performance of several causal models and offer guidance on how future studies should use Kemmeren.

摘要

因果模型极难验证,因为它们对混杂因素做出了无法检验的假设。新的科学实验为使用预测性能评估因果模型提供了可能性。预测性能度量通常对因果假设的违背具有鲁棒性。然而,预测性能确实取决于训练集和测试集的选择。特别是有偏差的训练集可能导致对模型性能的乐观评估。在这项工作中,我们重新审视了几个最近提出的因果模型在Kemmeren的基因扰动数据集[5]上测试的预测性能。我们发现样本选择偏差可能是模型性能的关键驱动因素。我们建议使用偏差较小的评估集来评估预测性能,并在这个新集合上比较模型。在这种情况下,与基于标准关联的估计器(如套索回归)相比,因果模型的性能相似或更差。最后,我们在模拟研究中比较了因果估计器的性能,该模拟研究重现了基因敲除实验的Kemmeren结构,但没有任何样本选择偏差。这些结果有助于更好地理解几种因果模型的性能,并为未来研究如何使用Kemmeren数据提供指导。

相似文献

1
Sample Selection Bias in Evaluation of Prediction Performance of Causal Models.因果模型预测性能评估中的样本选择偏差
Stat Anal Data Min. 2022 Feb;15(1):5-14. doi: 10.1002/sam.11559. Epub 2021 Oct 20.
2
Instrumental variables and inverse probability weighting for causal inference from longitudinal observational studies.纵向观察性研究因果推断的工具变量与逆概率加权法
Stat Methods Med Res. 2004 Feb;13(1):17-48. doi: 10.1191/0962280204sm351ra.
3
COVID-19 and the epistemology of epidemiological models at the dawn of AI.人工智能时代初期的新冠疫情与流行病学模型认识论
Ann Hum Biol. 2020 Sep;47(6):506-513. doi: 10.1080/03014460.2020.1839132.
4
Causal simulation experiments: Lessons from bias amplification.因果模拟实验:从偏差放大中吸取的教训。
Stat Methods Med Res. 2022 Jan;31(1):3-46. doi: 10.1177/0962280221995963. Epub 2021 Nov 23.
5
Causal inference accounting for unobserved confounding after outcome regression and doubly robust estimation.结果回归和双重稳健估计后考虑未观察到的混杂因素的因果推断。
Biometrics. 2019 Jun;75(2):506-515. doi: 10.1111/biom.13001. Epub 2019 Mar 29.
6
On model selection and model misspecification in causal inference.在因果推断中的模型选择和模型误设定。
Stat Methods Med Res. 2012 Feb;21(1):7-30. doi: 10.1177/0962280210387717. Epub 2010 Nov 12.
7
Impact of nonrandom selection mechanisms on the causal effect estimation for two-sample Mendelian randomization methods.两样本 Mendelian 随机化方法中,非随机选择机制对因果效应估计的影响。
PLoS Genet. 2022 Mar 17;18(3):e1010107. doi: 10.1371/journal.pgen.1010107. eCollection 2022 Mar.
8
Double Robust Efficient Estimators of Longitudinal Treatment Effects: Comparative Performance in Simulations and a Case Study.纵向治疗效果的双重稳健有效估计量:模拟中的比较性能及一个案例研究
Int J Biostat. 2019 Feb 26;15(2):/j/ijb.2019.15.issue-2/ijb-2017-0054/ijb-2017-0054.xml. doi: 10.1515/ijb-2017-0054.
9
The Causal Meaning of Genomic Predictors and How It Affects Construction and Comparison of Genome-Enabled Selection Models.基因组预测因子的因果意义及其对基于基因组选择模型构建与比较的影响
Genetics. 2015 Jun;200(2):483-94. doi: 10.1534/genetics.114.169490. Epub 2015 Apr 23.
10
Causal graphical views of fixed effects and random effects models.固定效应模型和随机效应模型的因果图形视图。
Br J Math Stat Psychol. 2021 May;74(2):165-183. doi: 10.1111/bmsp.12217. Epub 2020 Oct 15.

引用本文的文献

1
Causal models and prediction in cell line perturbation experiments.细胞系扰动实验中的因果模型与预测
BMC Bioinformatics. 2025 Jan 7;26(1):4. doi: 10.1186/s12859-024-06027-7.
2
An evaluation of synthetic data augmentation for mitigating covariate bias in health data.评估合成数据增强以减轻健康数据中的协变量偏差。
Patterns (N Y). 2024 Feb 29;5(4):100946. doi: 10.1016/j.patter.2024.100946. eCollection 2024 Apr 12.

本文引用的文献

1
Methods for causal inference from gene perturbation experiments and validation.基因扰动实验因果推断及验证方法。
Proc Natl Acad Sci U S A. 2016 Jul 5;113(27):7361-8. doi: 10.1073/pnas.1510493113.
2
Large-scale genetic perturbations reveal regulatory networks and an abundance of gene-specific repressors.大规模的遗传扰动揭示了调控网络和大量基因特异性抑制剂。
Cell. 2014 Apr 24;157(3):740-52. doi: 10.1016/j.cell.2014.02.054.
3
Regularization Paths for Generalized Linear Models via Coordinate Descent.基于坐标下降法的广义线性模型正则化路径
J Stat Softw. 2010;33(1):1-22.