半监督迁移学习在模型分类性能评估中的应用。

Semisupervised transfer learning for evaluation of model classification performance.

机构信息

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, United States.

Division of Biostatistics, Department of Population Health Sciences, University of Utah, Salt Lake City, UT 84108, United States.

出版信息

Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujae002.

DOI:10.1093/biomtc/ujae002

PMID:38465982

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10926267/

Abstract

In many modern machine learning applications, changes in covariate distributions and difficulty in acquiring outcome information have posed challenges to robust model training and evaluation. Numerous transfer learning methods have been developed to robustly adapt the model itself to some unlabeled target populations using existing labeled data in a source population. However, there is a paucity of literature on transferring performance metrics, especially receiver operating characteristic (ROC) parameters, of a trained model. In this paper, we aim to evaluate the performance of a trained binary classifier on unlabeled target population based on ROC analysis. We proposed Semisupervised Transfer lEarning of Accuracy Measures (STEAM), an efficient three-step estimation procedure that employs (1) double-index modeling to construct calibrated density ratio weights and (2) robust imputation to leverage the large amount of unlabeled data to improve estimation efficiency. We establish the consistency and asymptotic normality of the proposed estimator under the correct specification of either the density ratio model or the outcome model. We also correct for potential overfitting bias in the estimators in finite samples with cross-validation. We compare our proposed estimators to existing methods and show reductions in bias and gains in efficiency through simulations. We illustrate the practical utility of the proposed method on evaluating prediction performance of a phenotyping model for rheumatoid arthritis (RA) on a temporally evolving EHR cohort.

摘要

在许多现代机器学习应用中，协变量分布的变化和获取结果信息的困难给稳健的模型训练和评估带来了挑战。已经开发了许多迁移学习方法，以便使用源人群中的现有标记数据，稳健地将模型本身自适应到一些未标记的目标人群。然而，关于转移性能指标（尤其是接收器操作特性（ROC）参数）的文献很少。在本文中，我们旨在根据 ROC 分析评估在未标记目标人群中训练有素的二分类器的性能。我们提出了基于 ROC 分析的半监督迁移学习精度度量（STEAM），这是一种高效的三步估计过程，采用（1）双索引建模来构建校准的密度比权重，（2）稳健的插补，利用大量未标记的数据来提高估计效率。我们在密度比模型或结果模型的正确规范下建立了所提出估计器的一致性和渐近正态性。我们还通过交叉验证在有限样本中纠正估计器中的潜在过度拟合偏差。我们将我们提出的估计器与现有方法进行比较，并通过模拟显示出偏差的减少和效率的提高。我们在评估时间演变的 EHR 队列中用于类风湿关节炎（RA）表型模型的预测性能的实际实用程序上说明了所提出方法的实用性。

相似文献

Semisupervised transfer learning for evaluation of model classification performance.半监督迁移学习在模型分类性能评估中的应用。

Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujae002.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能？开发一种互联网应用算法。

Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.

Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.用于预测脓毒症患者脓毒症相关肝损伤的监督式机器学习模型：基于多中心队列研究的开发与验证研究

J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733.

Machine Learning Did Not Outperform Conventional Competing Risk Modeling to Predict Revision Arthroplasty.在预测翻修关节成形术方面，机器学习的表现并未优于传统的竞争风险模型。

Clin Orthop Relat Res. 2024 Aug 1;482(8):1472-1482. doi: 10.1097/CORR.0000000000003018. Epub 2024 Mar 12.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果：一种针对特定个体见解的新型验证方法。

Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗？

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.染色体臂 1p 和 19q 缺失的检测在胶质瘤患者中的诊断准确性和成本效益。

Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.

Proposal for Using AI to Assess Clinical Data Integrity and Generate Metadata: Algorithm Development and Validation.关于使用人工智能评估临床数据完整性并生成元数据的提案：算法开发与验证

JMIR Med Inform. 2025 Jun 30;13:e60204. doi: 10.2196/60204.

引用本文的文献

Bridging Data Gaps in Healthcare: A Scoping Review of Transfer Learning in Structured Data Analysis.弥合医疗保健领域的数据差距：结构化数据分析中迁移学习的范围综述

Health Data Sci. 2025 Sep 3;5:0321. doi: 10.34133/hds.0321. eCollection 2025.

A framework for evaluating clinical artificial intelligence systems without ground-truth annotations.一种无需真实标注即可评估临床人工智能系统的框架。

Nat Commun. 2024 Feb 28;15(1):1808. doi: 10.1038/s41467-024-46000-9.

本文引用的文献

Double/debiased machine learning for logistic partially linear model.逻辑部分线性模型的双重/去偏机器学习

Econom J. 2021 Sep;24(3):559-588. doi: 10.1093/ectj/utab019. Epub 2021 Jun 11.

Estimating the area under the ROC curve when transporting a prediction model to a target population.将预测模型传输到目标人群时估计 ROC 曲线下的面积。

Biometrics. 2023 Sep;79(3):2382-2393. doi: 10.1111/biom.13796. Epub 2022 Nov 25.

Efficient Evaluation of Prediction Rules in Semi-Supervised Settings under Stratified Sampling.分层抽样下半监督设置中预测规则的有效评估

J R Stat Soc Series B Stat Methodol. 2022 Sep;84(4):1353-1391. doi: 10.1111/rssb.12502. Epub 2022 Apr 26.

Transporting a Prediction Model for Use in a New Target Population.将预测模型运用于新目标人群。

Am J Epidemiol. 2023 Feb 1;192(2):296-304. doi: 10.1093/aje/kwac128.

Impact of ICD10 and secular changes on electronic medical record rheumatoid arthritis algorithms.ICD10 和长期变化对电子病历类风湿关节炎算法的影响。

Rheumatology (Oxford). 2020 Dec 1;59(12):3759-3766. doi: 10.1093/rheumatology/keaa198.

Estimating average treatment effects with a double-index propensity score.用双指标倾向得分估计平均治疗效果。

Biometrics. 2020 Sep;76(3):767-777. doi: 10.1111/biom.13195. Epub 2019 Dec 16.

A study of generalizability of recurrent neural network-based predictive models for heart failure onset risk using a large and heterogeneous EHR data set.使用大型且异构的 EHR 数据集研究基于递归神经网络的心力衰竭发作风险预测模型的可推广性。

J Biomed Inform. 2018 Aug;84:11-16. doi: 10.1016/j.jbi.2018.06.011. Epub 2018 Jun 15.

Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records.深度患者：一种从电子健康记录中预测患者未来的无监督表示。

Sci Rep. 2016 May 17;6:26094. doi: 10.1038/srep26094.

Inaccuracy of ICD-9 Codes for Chronic Kidney Disease: A Study from Two Practice-based Research Networks (PBRNs).国际疾病分类第九版（ICD-9）慢性肾病编码的准确性：来自两个基于实践的研究网络（PBRN）的研究

J Am Board Fam Med. 2015 Sep-Oct;28(5):678-82. doi: 10.3122/jabfm.2015.05.140136.

Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts.开发电子病历表型算法以比较3个慢性病队列中冠状动脉疾病风险的方法。

PLoS One. 2015 Aug 24;10(8):e0136651. doi: 10.1371/journal.pone.0136651. eCollection 2015.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验