• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用自动机器学习平台对基于合成数据训练的模型进行肺结核预测。

Prediction of Tuberculosis Using an Automated Machine Learning Platform for Models Trained on Synthetic Data.

作者信息

Rashidi Hooman H, Khan Imran H, Dang Luke T, Albahra Samer, Ratan Ujjwal, Chadderwala Nihir, To Wilson, Srinivas Prathima, Wajda Jeffery, Tran Nam K

机构信息

Department of Pathology and Laboratory Medicine, University of California, Davis, School of Medicine, Sacramento, California, United States of America.

Amazon Web Services, Seattle, Washington, United States of America.

出版信息

J Pathol Inform. 2022 Jan 20;13:10. doi: 10.4103/jpi.jpi_75_21. eCollection 2022.

DOI:10.4103/jpi.jpi_75_21
PMID:35136677
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8794034/
Abstract

High-quality medical data is critical to the development and implementation of machine learning (ML) algorithms in healthcare; however, security, and privacy concerns continue to limit access. We sought to determine the utility of "synthetic data" in training ML algorithms for the detection of tuberculosis (TB) from inflammatory biomarker profiles. A retrospective dataset (A) comprised of 278 patients was used to generate synthetic datasets (B, C, and D) for training models prior to secondary validation on a generalization dataset. ML models trained and validated on the Dataset A (real) demonstrated an accuracy of 90%, a sensitivity of 89% (95% CI, 83-94%), and a specificity of 100% (95% CI, 81-100%). Models trained using the optimal synthetic dataset B showed an accuracy of 91%, a sensitivity of 93% (95% CI, 87-96%), and a specificity of 77% (95% CI, 50-93%). Synthetic datasets C and D displayed diminished performance measures (respective accuracies of 71% and 54%). This pilot study highlights the promise of synthetic data as an expedited means for ML algorithm development.

摘要

高质量的医学数据对于医疗保健领域机器学习(ML)算法的开发和实施至关重要;然而,安全和隐私问题继续限制数据的获取。我们试图确定“合成数据”在训练用于从炎症生物标志物谱中检测结核病(TB)的ML算法中的效用。一个由278名患者组成的回顾性数据集(A)被用于生成合成数据集(B、C和D),以便在泛化数据集上进行二次验证之前训练模型。在数据集A(真实数据)上训练和验证的ML模型显示准确率为90%,灵敏度为89%(95%CI,83 - 94%),特异性为100%(95%CI,81 - 100%)。使用最优合成数据集B训练的模型显示准确率为91%,灵敏度为93%(95%CI,87 - 96%),特异性为77%(95%CI,50 - 93%)。合成数据集C和D的性能指标有所下降(各自的准确率为71%和54%)。这项初步研究突出了合成数据作为ML算法开发的一种快速手段的前景。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a17d/8794034/8e7552b3fefb/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a17d/8794034/139d1b6da3ae/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a17d/8794034/4fce6ddfa2d7/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a17d/8794034/7932ac19aa16/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a17d/8794034/852f58c734cd/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a17d/8794034/8e7552b3fefb/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a17d/8794034/139d1b6da3ae/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a17d/8794034/4fce6ddfa2d7/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a17d/8794034/7932ac19aa16/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a17d/8794034/852f58c734cd/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a17d/8794034/8e7552b3fefb/gr5.jpg

相似文献

1
Prediction of Tuberculosis Using an Automated Machine Learning Platform for Models Trained on Synthetic Data.使用自动机器学习平台对基于合成数据训练的模型进行肺结核预测。
J Pathol Inform. 2022 Jan 20;13:10. doi: 10.4103/jpi.jpi_75_21. eCollection 2022.
2
Machine learning models trained on synthetic datasets of multiple sample sizes for the use of predicting blood pressure from clinical data in a national dataset.使用来自国家数据集的临床数据,对多种样本量的合成数据集进行训练的机器学习模型,用于预测血压。
PLoS One. 2023 Mar 16;18(3):e0283094. doi: 10.1371/journal.pone.0283094. eCollection 2023.
3
Automated machine learning for endemic active tuberculosis prediction from multiplex serological data.基于多重血清学数据的地方性活动性肺结核预测的自动化机器学习。
Sci Rep. 2021 Sep 9;11(1):17900. doi: 10.1038/s41598-021-97453-7.
4
Replication of machine learning methods to predict treatment outcome with antidepressant medications in patients with major depressive disorder from STAR*D and CAN-BIND-1.复制机器学习方法,以预测 STAR*D 和 CAN-BIND-1 中重度抑郁症患者抗抑郁药物治疗效果。
PLoS One. 2021 Jun 28;16(6):e0253023. doi: 10.1371/journal.pone.0253023. eCollection 2021.
5
Assessment of differentially private synthetic data for utility and fairness in end-to-end machine learning pipelines for tabular data.用于表格数据的端到端机器学习管道中效用和公平性的差分隐私合成数据评估。
PLoS One. 2024 Feb 5;19(2):e0297271. doi: 10.1371/journal.pone.0297271. eCollection 2024.
6
Privacy preserving Generative Adversarial Networks to model Electronic Health Records.用于建模电子健康记录的隐私保护生成对抗网络。
Neural Netw. 2022 Sep;153:339-348. doi: 10.1016/j.neunet.2022.06.022. Epub 2022 Jun 25.
7
Evaluating and Enhancing the Generalization Performance of Machine Learning Models for Physical Activity Intensity Prediction From Raw Acceleration Data.评估和增强基于原始加速度数据的体力活动强度预测机器学习模型的泛化性能。
IEEE J Biomed Health Inform. 2020 Jan;24(1):27-38. doi: 10.1109/JBHI.2019.2917565. Epub 2019 May 20.
8
Machine Learning Approach to Predict Positive Screening of Methicillin-Resistant During Mechanical Ventilation Using Synthetic Dataset From MIMIC-IV Database.使用来自MIMIC-IV数据库的合成数据集,采用机器学习方法预测机械通气期间耐甲氧西林的阳性筛查结果。
Front Med (Lausanne). 2021 Nov 16;8:694520. doi: 10.3389/fmed.2021.694520. eCollection 2021.
9
Prediction of Neurological Outcomes in Out-of-hospital Cardiac Arrest Survivors Immediately after Return of Spontaneous Circulation: Ensemble Technique with Four Machine Learning Models.院外心脏骤停幸存者自主循环恢复后即刻的神经功能结局预测:四种机器学习模型的集成技术。
J Korean Med Sci. 2021 Jul 19;36(28):e187. doi: 10.3346/jkms.2021.36.e187.
10
OCT-based deep learning algorithm for the evaluation of treatment indication with anti-vascular endothelial growth factor medications.基于光学相干断层扫描的深度学习算法用于评估抗血管内皮生长因子药物的治疗指征
Graefes Arch Clin Exp Ophthalmol. 2018 Jan;256(1):91-98. doi: 10.1007/s00417-017-3839-y. Epub 2017 Nov 10.

引用本文的文献

1
Enhancing and Not Replacing Clinical Expertise: Improving Named-Entity Recognition in Colonoscopy Reports Through Mixed Real-Synthetic Training Sources.增强而非取代临床专业知识:通过混合真实与合成训练源提高结肠镜检查报告中的命名实体识别
J Pers Med. 2025 Jul 30;15(8):334. doi: 10.3390/jpm15080334.
2
Diagnostic Performance of Artificial Intelligence-Based Methods for Tuberculosis Detection: Systematic Review.基于人工智能的结核病检测方法的诊断性能:系统评价
J Med Internet Res. 2025 Mar 7;27:e69068. doi: 10.2196/69068.

本文引用的文献

1
Enhancing Military Burn- and Trauma-Related Acute Kidney Injury Prediction Through an Automated Machine Learning Platform and Point-of-Care Testing.通过自动化机器学习平台和即时检测增强军事烧伤和创伤相关急性肾损伤预测。
Arch Pathol Lab Med. 2021 Mar 1;145(3):320-326. doi: 10.5858/arpa.2020-0110-OA.
2
Automated En Masse Machine Learning Model Generation Shows Comparable Performance as Classic Regression Models for Predicting Delayed Graft Function in Renal Allografts.自动化批量机器学习模型生成在预测肾移植延迟性移植肾功能方面表现出与经典回归模型相当的性能。
Transplantation. 2021 Dec 1;105(12):2646-2654. doi: 10.1097/TP.0000000000003640.
3
Current Challenges and Barriers to Real-World Artificial Intelligence Adoption for the Healthcare System, Provider, and the Patient.
医疗系统、医疗服务提供者及患者在实际应用人工智能方面当前面临的挑战与障碍
Transl Vis Sci Technol. 2020 Aug 11;9(2):45. doi: 10.1167/tvst.9.2.45. eCollection 2020 Aug.
4
Novel application of an automated-machine learning development tool for predicting burn sepsis: proof of concept.自动化机器学习开发工具在预测烧伤脓毒症中的新应用:概念验证。
Sci Rep. 2020 Jul 23;10(1):12354. doi: 10.1038/s41598-020-69433-w.
5
Big data in digital healthcare: lessons learnt and recommendations for general practice.数字医疗中的大数据:全科医学的经验教训和建议。
Heredity (Edinb). 2020 Apr;124(4):525-534. doi: 10.1038/s41437-020-0303-2. Epub 2020 Mar 5.
6
Artificial Intelligence and Machine Learning in Pathology: The Present Landscape of Supervised Methods.病理学中的人工智能与机器学习:监督方法的现状
Acad Pathol. 2019 Sep 3;6:2374289519873088. doi: 10.1177/2374289519873088. eCollection 2019 Jan-Dec.
7
The medical AI insurgency: what physicians must know about data to practice with intelligent machines.医学人工智能的崛起:医生在与智能机器协作时必须了解的数据。
NPJ Digit Med. 2019 Jun 28;2:62. doi: 10.1038/s41746-019-0138-5. eCollection 2019.
8
Artificial intelligence and machine learning in clinical development: a translational perspective.临床开发中的人工智能与机器学习:转化医学视角
NPJ Digit Med. 2019 Jul 26;2:69. doi: 10.1038/s41746-019-0148-3. eCollection 2019.
9
De novo protein design by citizen scientists.公民科学家进行从头蛋白质设计。
Nature. 2019 Jun;570(7761):390-394. doi: 10.1038/s41586-019-1274-4. Epub 2019 Jun 5.
10
Simulation-assisted machine learning.模拟辅助机器学习。
Bioinformatics. 2019 Oct 15;35(20):4072-4080. doi: 10.1093/bioinformatics/btz199.