• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于模拟异质临床研究数据的变分自编码器模块化贝叶斯网络。

Variational Autoencoder Modular Bayesian Networks for Simulation of Heterogeneous Clinical Study Data.

作者信息

Gootjes-Dreesbach Luise, Sood Meemansa, Sahay Akrishta, Hofmann-Apitius Martin, Fröhlich Holger

机构信息

UCB Pharma (UCB Celltech Ltd.), Slough, United Kingdom.

Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany.

出版信息

Front Big Data. 2020 May 28;3:16. doi: 10.3389/fdata.2020.00016. eCollection 2020.

DOI:10.3389/fdata.2020.00016
PMID:33693390
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7931863/
Abstract

In the area of Big Data, one of the major obstacles for the progress of biomedical research is the existence of data "silos" because legal and ethical constraints often do not allow for sharing sensitive patient data from clinical studies across institutions. While federated machine learning now allows for building models from scattered data of the same format, there is still the need to investigate, mine, and understand data of separate and very differently designed clinical studies that can only be accessed within each of the data-hosting organizations. Simulation of sufficiently realistic virtual patients based on the data within each individual organization could be a way to fill this gap. In this work, we propose a new machine learning approach [Variational Autoencoder Modular Bayesian Network (VAMBN)] to learn a generative model of longitudinal clinical study data. VAMBN considers typical key aspects of such data, namely limited sample size coupled with comparable many variables of different numerical scales and statistical properties, and many missing values. We show that with VAMBN, we can simulate virtual patients in a sufficiently realistic manner while making theoretical guarantees on data privacy. In addition, VAMBN allows for simulating counterfactual scenarios. Hence, VAMBN could facilitate data sharing as well as design of clinical trials.

摘要

在大数据领域,生物医学研究进展的主要障碍之一是数据“孤岛”的存在,因为法律和伦理限制通常不允许跨机构共享临床研究中的敏感患者数据。虽然联邦机器学习现在允许从相同格式的分散数据构建模型,但仍有必要对只能在每个数据托管组织内部访问的、设计截然不同的单独临床研究数据进行调查、挖掘和理解。基于每个组织内的数据模拟足够逼真的虚拟患者可能是填补这一空白的一种方法。在这项工作中,我们提出了一种新的机器学习方法[变分自编码器模块化贝叶斯网络(VAMBN)]来学习纵向临床研究数据的生成模型。VAMBN考虑了此类数据的典型关键方面,即样本量有限,同时伴有许多具有不同数值尺度和统计特性的变量,以及大量缺失值。我们表明,使用VAMBN,我们可以以足够逼真的方式模拟虚拟患者,同时在数据隐私方面提供理论保障。此外,VAMBN允许模拟反事实场景。因此,VAMBN可以促进数据共享以及临床试验设计。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ad/7931863/9a8f416a48d5/fdata-03-00016-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ad/7931863/e05129be353d/fdata-03-00016-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ad/7931863/9d41689c4750/fdata-03-00016-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ad/7931863/8be0f6795bb0/fdata-03-00016-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ad/7931863/6b17269f931a/fdata-03-00016-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ad/7931863/86dbad486124/fdata-03-00016-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ad/7931863/f10dd50a9a3f/fdata-03-00016-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ad/7931863/5ba09118f4cd/fdata-03-00016-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ad/7931863/fa56fc8becec/fdata-03-00016-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ad/7931863/9a8f416a48d5/fdata-03-00016-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ad/7931863/e05129be353d/fdata-03-00016-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ad/7931863/9d41689c4750/fdata-03-00016-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ad/7931863/8be0f6795bb0/fdata-03-00016-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ad/7931863/6b17269f931a/fdata-03-00016-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ad/7931863/86dbad486124/fdata-03-00016-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ad/7931863/f10dd50a9a3f/fdata-03-00016-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ad/7931863/5ba09118f4cd/fdata-03-00016-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ad/7931863/fa56fc8becec/fdata-03-00016-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ad/7931863/9a8f416a48d5/fdata-03-00016-g0009.jpg

相似文献

1
Variational Autoencoder Modular Bayesian Networks for Simulation of Heterogeneous Clinical Study Data.用于模拟异质临床研究数据的变分自编码器模块化贝叶斯网络。
Front Big Data. 2020 May 28;3:16. doi: 10.3389/fdata.2020.00016. eCollection 2020.
2
Realistic simulation of virtual multi-scale, multi-modal patient trajectories using Bayesian networks and sparse auto-encoders.使用贝叶斯网络和稀疏自编码器对虚拟多尺度、多模态患者轨迹进行真实模拟。
Sci Rep. 2020 Jul 3;10(1):10971. doi: 10.1038/s41598-020-67398-4.
3
Synthetic electronic health records generated with variational graph autoencoders.使用变分图自动编码器生成的合成电子健康记录。
NPJ Digit Med. 2023 Apr 29;6(1):83. doi: 10.1038/s41746-023-00822-x.
4
A multicenter random forest model for effective prognosis prediction in collaborative clinical research network.多中心随机森林模型在协作临床研究网络中的有效预后预测。
Artif Intell Med. 2020 Mar;103:101814. doi: 10.1016/j.artmed.2020.101814. Epub 2020 Feb 5.
5
Implementing Vertical Federated Learning Using Autoencoders: Practical Application, Generalizability, and Utility Study.使用自动编码器实现垂直联邦学习:实际应用、通用性和效用研究。
JMIR Med Inform. 2021 Jun 9;9(6):e26598. doi: 10.2196/26598.
6
A comparison between discrete and continuous time Bayesian networks in learning from clinical time series data with irregularity.在存在不规则性的临床时间序列数据中学习时,离散时间和连续时间贝叶斯网络的比较。
Artif Intell Med. 2019 Apr;95:104-117. doi: 10.1016/j.artmed.2018.10.002. Epub 2019 Jan 22.
7
Improving Deep Reinforcement Learning With Transitional Variational Autoencoders: A Healthcare Application.用过渡性变分自动编码器改进深度强化学习:医疗保健应用。
IEEE J Biomed Health Inform. 2021 Jun;25(6):2273-2280. doi: 10.1109/JBHI.2020.3027443. Epub 2021 Jun 3.
8
Mirrored STDP Implements Autoencoder Learning in a Network of Spiking Neurons.镜像脉冲时间依赖可塑性在脉冲神经元网络中实现自动编码器学习。
PLoS Comput Biol. 2015 Dec 3;11(12):e1004566. doi: 10.1371/journal.pcbi.1004566. eCollection 2015 Dec.
9
druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico.druGAN:一种高级生成对抗自动编码器模型,可在计算机上从头生成具有所需分子特性的新分子。
Mol Pharm. 2017 Sep 5;14(9):3098-3104. doi: 10.1021/acs.molpharmaceut.7b00346. Epub 2017 Aug 4.
10
BRAIN LESION DETECTION USING A ROBUST VARIATIONAL AUTOENCODER AND TRANSFER LEARNING.使用鲁棒变分自编码器和迁移学习进行脑损伤检测
Proc IEEE Int Symp Biomed Imaging. 2020 Apr;2020:786-790. doi: 10.1109/isbi45749.2020.9098405. Epub 2020 May 22.

引用本文的文献

1
Identification of Post-Ictal Generalised EEG Suppression with Two-Channel EEG.通过双通道脑电图识别发作后广泛性脑电图抑制。
Sensors (Basel). 2025 Aug 9;25(16):4932. doi: 10.3390/s25164932.
2
On the fidelity versus privacy and utility trade-off of synthetic patient data.论合成患者数据的保真度与隐私及效用之间的权衡
iScience. 2025 Apr 14;28(5):112382. doi: 10.1016/j.isci.2025.112382. eCollection 2025 May 16.
3
Augmenting Insufficiently Accruing Oncology Clinical Trials Using Generative Models: Validation Study.使用生成模型增强入组不足的肿瘤学临床试验:验证研究

本文引用的文献

1
Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing.隐私保护生成式深度神经网络支持临床数据共享。
Circ Cardiovasc Qual Outcomes. 2019 Jul;12(7):e005122. doi: 10.1161/CIRCOUTCOMES.118.005122. Epub 2019 Jul 9.
2
Evaluation of Causal Structure Learning Methods on Mixed Data Types.混合数据类型下因果结构学习方法的评估
Proc Mach Learn Res. 2018 Aug;92:48-65.
3
From hype to reality: data science enabling personalized medicine.从炒作到现实:数据科学推动个性化医疗。
J Med Internet Res. 2025 Mar 5;27:e66821. doi: 10.2196/66821.
4
A stratified treatment algorithm in psychiatry: a program on stratified pharmacogenomics in severe mental illness (Psych-STRATA): concept, objectives and methodologies of a multidisciplinary project funded by Horizon Europe.精神病学中的分层治疗算法:一项关于严重精神疾病分层药物基因组学的计划(Psych-STRATA):由欧洲地平线资助的多学科项目的概念、目标和方法
Eur Arch Psychiatry Clin Neurosci. 2024 Dec 27. doi: 10.1007/s00406-024-01944-3.
5
A Survey on Computational Methods in Drug Discovery for Neurodegenerative Diseases.计算方法在神经退行性疾病药物发现中的应用研究综述
Biomolecules. 2024 Oct 19;14(10):1330. doi: 10.3390/biom14101330.
6
Synthetic data generation for a longitudinal cohort study - evaluation, method extension and reproduction of published data analysis results.纵向队列研究的合成数据生成 - 评估、方法扩展和已发表数据分析结果的再现。
Sci Rep. 2024 Jun 22;14(1):14412. doi: 10.1038/s41598-024-62102-2.
7
[Artificial intelligence and secure use of health data in the KI-FDZ project: anonymization, synthetization, and secure processing of real-world data].[人工智能与KI-FDZ项目中健康数据的安全使用:现实世界数据的匿名化、合成及安全处理]
Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2024 Feb;67(2):171-179. doi: 10.1007/s00103-023-03823-z. Epub 2024 Jan 4.
8
A method for generating synthetic longitudinal health data.一种生成合成纵向健康数据的方法。
BMC Med Res Methodol. 2023 Mar 23;23(1):67. doi: 10.1186/s12874-023-01869-w.
9
Bayesian network modeling of risk and prodromal markers of Parkinson's disease.帕金森病风险和前驱标志物的贝叶斯网络建模。
PLoS One. 2023 Feb 24;18(2):e0280609. doi: 10.1371/journal.pone.0280609. eCollection 2023.
10
AI reveals insights into link between CD33 and cognitive impairment in Alzheimer's Disease.人工智能揭示 CD33 与阿尔茨海默病认知障碍之间关联的新见解。
PLoS Comput Biol. 2023 Feb 13;19(2):e1009894. doi: 10.1371/journal.pcbi.1009894. eCollection 2023 Feb.
BMC Med. 2018 Aug 27;16(1):150. doi: 10.1186/s12916-018-1122-7.
4
Scoring Bayesian Networks of Mixed Variables.混合变量的贝叶斯网络评分
Int J Data Sci Anal. 2018 Aug;6(1):3-18. doi: 10.1007/s41060-017-0085-7. Epub 2018 Jan 11.
5
In silico clinical trials: concepts and early adoptions.计算机临床试验:概念与早期应用。
Brief Bioinform. 2019 Sep 27;20(5):1699-1708. doi: 10.1093/bib/bby043.
6
Exploring the Potential of Generative Adversarial Networks for Synthesizing Radiological Images of the Spine to be Used in Trials.探索生成对抗网络在合成用于试验的脊柱放射图像方面的潜力。
Front Bioeng Biotechnol. 2018 May 3;6:53. doi: 10.3389/fbioe.2018.00053. eCollection 2018.
7
Next-generation, personalised, model-based critical care medicine: a state-of-the art review of in silico virtual patient models, methods, and cohorts, and how to validation them.下一代个性化基于模型的重症监护医学:计算虚拟患者模型、方法和队列的最新技术综述,以及如何对其进行验证。
Biomed Eng Online. 2018 Feb 20;17(1):24. doi: 10.1186/s12938-018-0455-y.
8
Simulating clinical trial visits yields patient insights into study design and recruitment.模拟临床试验访视能让患者对研究设计和招募情况有所了解。
Patient Prefer Adherence. 2017 Jul 31;11:1295-1307. doi: 10.2147/PPA.S137416. eCollection 2017.
9
Comparison for Efficacy and Tolerability among Ten Drugs for Treatment of Parkinson's Disease: A Network Meta-Analysis.十种治疗帕金森病药物的疗效和耐受性比较:网状荟萃分析。
Sci Rep. 2017 Apr 4;8:45865. doi: 10.1038/srep45865.
10
The prevention and handling of the missing data.数据缺失的预防和处理。
Korean J Anesthesiol. 2013 May;64(5):402-6. doi: 10.4097/kjae.2013.64.5.402. Epub 2013 May 24.