• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于非对称数据集成的交叉验证统计框架。

A cross-validation statistical framework for asymmetric data integration.

机构信息

Department of Biostatistics, University of Michigan, Ann Arbor, Michigan.

出版信息

Biometrics. 2023 Jun;79(2):1280-1292. doi: 10.1111/biom.13685. Epub 2022 May 23.

DOI:10.1111/biom.13685
PMID:35524490
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9637892/
Abstract

The proliferation of biobanks and large public clinical data sets enables their integration with a smaller amount of locally gathered data for the purposes of parameter estimation and model prediction. However, public data sets may be subject to context-dependent confounders and the protocols behind their generation are often opaque; naively integrating all external data sets equally can bias estimates and lead to spurious conclusions. Weighted data integration is a potential solution, but current methods still require subjective specifications of weights and can become computationally intractable. Under the assumption that local data are generated from the set of unknown true parameters, we propose a novel weighted integration method based upon using the external data to minimize the local data leave-one-out cross validation (LOOCV) error. We demonstrate how the optimization of LOOCV errors for linear and Cox proportional hazards models can be rewritten as functions of external data set integration weights. Significant reductions in estimation error and prediction error are shown using simulation studies mimicking the heterogeneity of clinical data as well as a real-world example using kidney transplant patients from the Scientific Registry of Transplant Recipients.

摘要

生物银行和大型公共临床数据集的激增使得它们能够与少量本地收集的数据相结合,用于参数估计和模型预测。然而,公共数据集可能受到上下文相关的混杂因素的影响,并且其生成背后的方案通常不透明;盲目地平等地整合所有外部数据集可能会产生偏差估计,并导致虚假结论。加权数据集成是一种潜在的解决方案,但目前的方法仍然需要主观指定权重,并且可能变得计算上难以处理。在假设本地数据是从一组未知真实参数集中生成的情况下,我们提出了一种新的基于使用外部数据最小化本地数据留一法交叉验证 (LOOCV) 误差的加权集成方法。我们展示了如何将线性和 Cox 比例风险模型的 LOOCV 误差的优化重写为外部数据集集成权重的函数。通过模拟临床数据异质性的模拟研究以及使用来自 Scientific Registry of Transplant Recipients 的肾移植患者的真实示例,显示了在估计误差和预测误差方面的显著降低。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/140e/9637892/60e004bc8859/nihms-1804726-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/140e/9637892/d969c6106fd9/nihms-1804726-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/140e/9637892/3b0e15e44472/nihms-1804726-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/140e/9637892/4a3262e0239d/nihms-1804726-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/140e/9637892/60e004bc8859/nihms-1804726-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/140e/9637892/d969c6106fd9/nihms-1804726-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/140e/9637892/3b0e15e44472/nihms-1804726-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/140e/9637892/4a3262e0239d/nihms-1804726-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/140e/9637892/60e004bc8859/nihms-1804726-f0004.jpg

相似文献

1
A cross-validation statistical framework for asymmetric data integration.一种用于非对称数据集成的交叉验证统计框架。
Biometrics. 2023 Jun;79(2):1280-1292. doi: 10.1111/biom.13685. Epub 2022 May 23.
2
Bias in error estimation when using cross-validation for model selection.在使用交叉验证进行模型选择时误差估计中的偏差。
BMC Bioinformatics. 2006 Feb 23;7:91. doi: 10.1186/1471-2105-7-91.
3
Cross-validation of best linear unbiased predictions of breeding values using an efficient leave-one-out strategy.利用高效的留一法对育种值的最佳线性无偏预测进行交叉验证。
J Anim Breed Genet. 2021 Sep;138(5):519-527. doi: 10.1111/jbg.12545. Epub 2021 Mar 17.
4
A semiparametric copula method for Cox models with covariate measurement error.一种用于处理协变量测量误差的Cox模型的半参数Copula方法。
Lifetime Data Anal. 2016 Jan;22(1):1-16. doi: 10.1007/s10985-014-9315-7. Epub 2014 Dec 13.
5
Correcting for Measurement Error in Time-Varying Covariates in Marginal Structural Models.边际结构模型中时变协变量测量误差的校正
Am J Epidemiol. 2016 Aug 1;184(3):249-58. doi: 10.1093/aje/kww068. Epub 2016 Jul 13.
6
Statistical methods of indirect comparison with real-world data for survival endpoint under non-proportional hazards.非比例风险生存终点的真实世界数据间接比较的统计学方法。
J Biopharm Stat. 2022 Jul 4;32(4):582-599. doi: 10.1080/10543406.2022.2080696. Epub 2022 Jun 8.
7
Learning from local to global: An efficient distributed algorithm for modeling time-to-event data.从局部到全局学习:一种用于建模事件时间数据的高效分布式算法。
J Am Med Inform Assoc. 2020 Jul 1;27(7):1028-1036. doi: 10.1093/jamia/ocaa044.
8
Efficient approximate k-fold and leave-one-out cross-validation for ridge regression.用于岭回归的高效近似k折交叉验证和留一法交叉验证
Biom J. 2013 Mar;55(2):141-55. doi: 10.1002/bimj.201200088. Epub 2013 Jan 24.
9
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
10
Evaluation of normalization methods for cDNA microarray data by k-NN classification.通过k近邻分类评估cDNA微阵列数据的标准化方法
BMC Bioinformatics. 2005 Jul 26;6:191. doi: 10.1186/1471-2105-6-191.

引用本文的文献

1
Decoding per- and polyfluoroalkyl substances (PFAS) in hepatocellular carcinoma: a multi-omics and computational toxicology approach.肝细胞癌中全氟和多氟烷基物质(PFAS)的解码:一种多组学和计算毒理学方法。
J Transl Med. 2025 May 2;23(1):504. doi: 10.1186/s12967-025-06517-z.
2
Predictive and personalized approaches for idiopathic pulmonary fibrosis: a Wnt-related gene set scoring framework integrating single-cell sequencing, spatial transcriptomics, and machine learning for diagnosis and prognosis.特发性肺纤维化的预测性和个性化方法:一种整合单细胞测序、空间转录组学和机器学习的与Wnt相关的基因集评分框架用于诊断和预后评估
Funct Integr Genomics. 2025 Mar 13;25(1):62. doi: 10.1007/s10142-025-01571-8.
3
The sustainable development of mathematics subject: An empirical analysis based on the academic attention and literature research.数学学科的可持续发展:基于学术关注度与文献研究的实证分析
Heliyon. 2023 Jul 29;9(8):e18750. doi: 10.1016/j.heliyon.2023.e18750. eCollection 2023 Aug.