• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

非公开数据分析的优势与局限:使用VisualSHIELD对乳腺癌生存分类器的比较

Strengths and limitations of non-disclosive data analysis: a comparison of breast cancer survival classifiers using VisualSHIELD.

作者信息

Tomasoni Danilo, Lombardo Rosario, Lauria Mario

机构信息

Fondazione the Microsoft Research-University of Trento Centre for Computational and Systems Biology (COSBI), Rovereto, Italy.

Department of Economics, University of Verona, Verona, Italy.

出版信息

Front Genet. 2024 Jan 29;15:1270387. doi: 10.3389/fgene.2024.1270387. eCollection 2024.

DOI:10.3389/fgene.2024.1270387
PMID:38348453
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10859452/
Abstract

Preserving data privacy is an important concern in the research use of patient data. The DataSHIELD suite enables privacy-aware advanced statistical analysis in a federated setting. Despite its many applications, it has a few open practical issues: the complexity of hosting a federated infrastructure, the performance penalty imposed by the privacy-preserving constraints, and the ease of use by non-technical users. In this work, we describe a case study in which we review different breast cancer classifiers and report our findings about the limits and advantages of such non-disclosive suite of tools in a realistic setting. Five independent gene expression datasets of breast cancer survival were downloaded from Gene Expression Omnibus (GEO) and pooled together through the federated infrastructure. Three previously published and two newly proposed 5-year cancer-free survival risk score classifiers were trained in a federated environment, and an additional reference classifier was trained with unconstrained data access. The performance of these six classifiers was systematically evaluated, and the results show that i) the published classifiers do not generalize well when applied to patient cohorts that differ from those used to develop them; ii) among the methods we tried, the classification using logistic regression worked better on average, closely followed by random forest; iii) the unconstrained version of the logistic regression classifier outperformed the federated version by 4 on average. Reproducibility of our experiments is ensured through the use of VisualSHIELD, an open-source tool that augments DataSHIELD with new functions, a standardized deployment procedure, and a simple graphical user interface.

摘要

在患者数据的研究使用中,保护数据隐私是一个重要问题。DataSHIELD套件能够在联邦环境中进行隐私感知的高级统计分析。尽管它有许多应用,但仍存在一些实际的开放性问题:托管联邦基础设施的复杂性、隐私保护约束带来的性能损失,以及非技术用户的易用性。在这项工作中,我们描述了一个案例研究,其中我们回顾了不同的乳腺癌分类器,并报告了我们在实际环境中关于这种非披露性工具套件的局限性和优势的发现。从基因表达综合数据库(GEO)下载了五个独立的乳腺癌生存基因表达数据集,并通过联邦基础设施将它们汇总在一起。在联邦环境中训练了三个先前发表的和两个新提出的5年无癌生存风险评分分类器,并使用无约束数据访问训练了一个额外的参考分类器。系统地评估了这六个分类器的性能,结果表明:i)当应用于与用于开发它们的患者队列不同的患者队列时,已发表的分类器泛化效果不佳;ii)在我们尝试的方法中,使用逻辑回归的分类平均效果更好,其次是随机森林;iii)逻辑回归分类器的无约束版本平均比联邦版本高出4分。通过使用VisualSHIELD确保了我们实验的可重复性,VisualSHIELD是一个开源工具,它通过新功能、标准化部署程序和简单的图形用户界面增强了DataSHIELD。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6ca/10859452/4d7d263e4459/fgene-15-1270387-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6ca/10859452/99cd840b893d/fgene-15-1270387-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6ca/10859452/307624f94352/fgene-15-1270387-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6ca/10859452/4d7d263e4459/fgene-15-1270387-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6ca/10859452/99cd840b893d/fgene-15-1270387-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6ca/10859452/307624f94352/fgene-15-1270387-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6ca/10859452/4d7d263e4459/fgene-15-1270387-g003.jpg

相似文献

1
Strengths and limitations of non-disclosive data analysis: a comparison of breast cancer survival classifiers using VisualSHIELD.非公开数据分析的优势与局限:使用VisualSHIELD对乳腺癌生存分类器的比较
Front Genet. 2024 Jan 29;15:1270387. doi: 10.3389/fgene.2024.1270387. eCollection 2024.
2
dsSurvival 2.0: privacy enhancing survival curves for survival models in the federated DataSHIELD analysis system.dsSurvival 2.0:在联邦化的 DataSHIELD 分析系统中,用于生存模型的增强隐私保护的生存曲线。
BMC Res Notes. 2023 Jun 6;16(1):98. doi: 10.1186/s13104-023-06372-5.
3
Privacy-Preserving Workflow for the Cross-Border Federated Analysis of Clinical Data.跨境联邦临床数据分析的隐私保护工作流程。
Stud Health Technol Inform. 2024 Aug 22;316:1637-1641. doi: 10.3233/SHTI240737.
4
DataSHIELD: taking the analysis to the data, not the data to the analysis.数据护盾:将分析带到数据那里,而不是把数据带到分析这边。
Int J Epidemiol. 2014 Dec;43(6):1929-44. doi: 10.1093/ije/dyu188. Epub 2014 Sep 26.
5
A multicenter random forest model for effective prognosis prediction in collaborative clinical research network.多中心随机森林模型在协作临床研究网络中的有效预后预测。
Artif Intell Med. 2020 Mar;103:101814. doi: 10.1016/j.artmed.2020.101814. Epub 2020 Feb 5.
6
The BioRef Infrastructure, a Framework for Real-Time, Federated, Privacy-Preserving, and Personalized Reference Intervals: Design, Development, and Application.生物参考信息基础设施:一个用于实时、联合、隐私保护和个性化参考区间的框架:设计、开发和应用。
J Med Internet Res. 2023 Oct 18;25:e47254. doi: 10.2196/47254.
7
dsSynthetic: synthetic data generation for the DataSHIELD federated analysis system.dsSynthetic:用于 DataSHIELD 联邦分析系统的合成数据生成。
BMC Res Notes. 2022 Jun 27;15(1):230. doi: 10.1186/s13104-022-06111-2.
8
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
9
Memory-aware curriculum federated learning for breast cancer classification.基于记忆感知的乳腺癌分类联邦学习课程。
Comput Methods Programs Biomed. 2023 Feb;229:107318. doi: 10.1016/j.cmpb.2022.107318. Epub 2022 Dec 20.
10
Learning From Others Without Sacrificing Privacy: Simulation Comparing Centralized and Federated Machine Learning on Mobile Health Data.从他人身上学习而不牺牲隐私:移动健康数据集中式和联邦机器学习的模拟比较。
JMIR Mhealth Uhealth. 2021 Mar 30;9(3):e23728. doi: 10.2196/23728.

本文引用的文献

1
Data sharing in the age of deep learning.深度学习时代的数据共享。
Nat Biotechnol. 2023 Apr;41(4):433. doi: 10.1038/s41587-023-01770-3.
2
Analyzing biomarker discovery: Estimating the reproducibility of biomarker sets.分析生物标志物发现:估计生物标志物集的可重复性。
PLoS One. 2022 Jul 28;17(7):e0252697. doi: 10.1371/journal.pone.0252697. eCollection 2022.
3
Increasing reproducibility, robustness, and generalizability of biomarker selection from meta-analysis using Bayesian methodology.采用贝叶斯方法提高生物标志物选择的可重复性、稳健性和通用性:荟萃分析研究。
PLoS Comput Biol. 2022 Jun 27;18(6):e1010260. doi: 10.1371/journal.pcbi.1010260. eCollection 2022 Jun.
4
A Clinicogenetic Prognostic Classifier for Prediction of Recurrence and Survival in Asian Breast Cancer Patients.用于预测亚洲乳腺癌患者复发和生存的临床遗传学预后分类器
Front Oncol. 2021 Mar 17;11:645853. doi: 10.3389/fonc.2021.645853. eCollection 2021.
5
Orchestrating privacy-protected big data analyses of data from different resources with R and DataSHIELD.使用R和DataSHIELD对来自不同资源的数据进行隐私保护的大数据分析编排。
PLoS Comput Biol. 2021 Mar 30;17(3):e1008880. doi: 10.1371/journal.pcbi.1008880. eCollection 2021 Mar.
6
Guide to presenting clinical prediction models for use in clinical settings.临床环境中使用的临床预测模型呈现指南。
BMJ. 2019 Apr 17;365:l737. doi: 10.1136/bmj.l737.
7
ONS: an ontology for a standardized description of interventions and observational studies in nutrition.ONS:一种用于营养干预和观察性研究标准化描述的本体论。
Genes Nutr. 2018 Apr 30;13:12. doi: 10.1186/s12263-018-0601-y. eCollection 2018.
8
Joint Data Analysis in Nutritional Epidemiology: Identification of Observational Studies and Minimal Requirements.营养流行病学中的联合数据分析:观察性研究的识别和最低要求。
J Nutr. 2018 Feb 1;148(2):285-297. doi: 10.1093/jn/nxx037.
9
Software Application Profile: Opal and Mica: open-source software solutions for epidemiological data management, harmonization and dissemination.软件应用程序简介:Opal 和 Mica:用于流行病学数据管理、协调和传播的开源软件解决方案。
Int J Epidemiol. 2017 Oct 1;46(5):1372-1378. doi: 10.1093/ije/dyx180.
10
Validation of the 18-gene classifier as a prognostic biomarker of distant metastasis in breast cancer.验证18基因分类器作为乳腺癌远处转移的预后生物标志物。
PLoS One. 2017 Sep 8;12(9):e0184372. doi: 10.1371/journal.pone.0184372. eCollection 2017.