使用自动机器学习平台对基于合成数据训练的模型进行肺结核预测。

Prediction of Tuberculosis Using an Automated Machine Learning Platform for Models Trained on Synthetic Data.

作者信息

Rashidi Hooman H, Khan Imran H, Dang Luke T, Albahra Samer, Ratan Ujjwal, Chadderwala Nihir, To Wilson, Srinivas Prathima, Wajda Jeffery, Tran Nam K

机构信息

Department of Pathology and Laboratory Medicine, University of California, Davis, School of Medicine, Sacramento, California, United States of America.

Amazon Web Services, Seattle, Washington, United States of America.

出版信息

J Pathol Inform. 2022 Jan 20;13:10. doi: 10.4103/jpi.jpi_75_21. eCollection 2022.

DOI:10.4103/jpi.jpi_75_21

PMID:35136677

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8794034/

Abstract

High-quality medical data is critical to the development and implementation of machine learning (ML) algorithms in healthcare; however, security, and privacy concerns continue to limit access. We sought to determine the utility of "synthetic data" in training ML algorithms for the detection of tuberculosis (TB) from inflammatory biomarker profiles. A retrospective dataset (A) comprised of 278 patients was used to generate synthetic datasets (B, C, and D) for training models prior to secondary validation on a generalization dataset. ML models trained and validated on the Dataset A (real) demonstrated an accuracy of 90%, a sensitivity of 89% (95% CI, 83-94%), and a specificity of 100% (95% CI, 81-100%). Models trained using the optimal synthetic dataset B showed an accuracy of 91%, a sensitivity of 93% (95% CI, 87-96%), and a specificity of 77% (95% CI, 50-93%). Synthetic datasets C and D displayed diminished performance measures (respective accuracies of 71% and 54%). This pilot study highlights the promise of synthetic data as an expedited means for ML algorithm development.

摘要

高质量的医学数据对于医疗保健领域机器学习（ML）算法的开发和实施至关重要；然而，安全和隐私问题继续限制数据的获取。我们试图确定“合成数据”在训练用于从炎症生物标志物谱中检测结核病（TB）的ML算法中的效用。一个由278名患者组成的回顾性数据集（A）被用于生成合成数据集（B、C和D），以便在泛化数据集上进行二次验证之前训练模型。在数据集A（真实数据）上训练和验证的ML模型显示准确率为90%，灵敏度为89%（95%CI，83 - 94%），特异性为100%（95%CI，81 - 100%）。使用最优合成数据集B训练的模型显示准确率为91%，灵敏度为93%（95%CI，87 - 96%），特异性为77%（95%CI，50 - 93%）。合成数据集C和D的性能指标有所下降（各自的准确率为71%和54%）。这项初步研究突出了合成数据作为ML算法开发的一种快速手段的前景。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a17d/8794034/139d1b6da3ae/gr1.jpg

相似文献

Prediction of Tuberculosis Using an Automated Machine Learning Platform for Models Trained on Synthetic Data.

J Pathol Inform. 2022 Jan 20;13:10. doi: 10.4103/jpi.jpi_75_21. eCollection 2022.

Machine learning models trained on synthetic datasets of multiple sample sizes for the use of predicting blood pressure from clinical data in a national dataset.

PLoS One. 2023 Mar 16;18(3):e0283094. doi: 10.1371/journal.pone.0283094. eCollection 2023.

Automated machine learning for endemic active tuberculosis prediction from multiplex serological data.

Sci Rep. 2021 Sep 9;11(1):17900. doi: 10.1038/s41598-021-97453-7.

Replication of machine learning methods to predict treatment outcome with antidepressant medications in patients with major depressive disorder from STAR*D and CAN-BIND-1.

PLoS One. 2021 Jun 28;16(6):e0253023. doi: 10.1371/journal.pone.0253023. eCollection 2021.

Assessment of differentially private synthetic data for utility and fairness in end-to-end machine learning pipelines for tabular data.

PLoS One. 2024 Feb 5;19(2):e0297271. doi: 10.1371/journal.pone.0297271. eCollection 2024.

Privacy preserving Generative Adversarial Networks to model Electronic Health Records.

Neural Netw. 2022 Sep;153:339-348. doi: 10.1016/j.neunet.2022.06.022. Epub 2022 Jun 25.

Evaluating and Enhancing the Generalization Performance of Machine Learning Models for Physical Activity Intensity Prediction From Raw Acceleration Data.

IEEE J Biomed Health Inform. 2020 Jan;24(1):27-38. doi: 10.1109/JBHI.2019.2917565. Epub 2019 May 20.

Machine Learning Approach to Predict Positive Screening of Methicillin-Resistant During Mechanical Ventilation Using Synthetic Dataset From MIMIC-IV Database.

Front Med (Lausanne). 2021 Nov 16;8:694520. doi: 10.3389/fmed.2021.694520. eCollection 2021.

Prediction of Neurological Outcomes in Out-of-hospital Cardiac Arrest Survivors Immediately after Return of Spontaneous Circulation: Ensemble Technique with Four Machine Learning Models.

J Korean Med Sci. 2021 Jul 19;36(28):e187. doi: 10.3346/jkms.2021.36.e187.

OCT-based deep learning algorithm for the evaluation of treatment indication with anti-vascular endothelial growth factor medications.

Graefes Arch Clin Exp Ophthalmol. 2018 Jan;256(1):91-98. doi: 10.1007/s00417-017-3839-y. Epub 2017 Nov 10.

引用本文的文献

Enhancing and Not Replacing Clinical Expertise: Improving Named-Entity Recognition in Colonoscopy Reports Through Mixed Real-Synthetic Training Sources.

J Pers Med. 2025 Jul 30;15(8):334. doi: 10.3390/jpm15080334.

Diagnostic Performance of Artificial Intelligence-Based Methods for Tuberculosis Detection: Systematic Review.

J Med Internet Res. 2025 Mar 7;27:e69068. doi: 10.2196/69068.

本文引用的文献

Enhancing Military Burn- and Trauma-Related Acute Kidney Injury Prediction Through an Automated Machine Learning Platform and Point-of-Care Testing.

Arch Pathol Lab Med. 2021 Mar 1;145(3):320-326. doi: 10.5858/arpa.2020-0110-OA.

Automated En Masse Machine Learning Model Generation Shows Comparable Performance as Classic Regression Models for Predicting Delayed Graft Function in Renal Allografts.

Transplantation. 2021 Dec 1;105(12):2646-2654. doi: 10.1097/TP.0000000000003640.

Current Challenges and Barriers to Real-World Artificial Intelligence Adoption for the Healthcare System, Provider, and the Patient.

Transl Vis Sci Technol. 2020 Aug 11;9(2):45. doi: 10.1167/tvst.9.2.45. eCollection 2020 Aug.

Novel application of an automated-machine learning development tool for predicting burn sepsis: proof of concept.

Sci Rep. 2020 Jul 23;10(1):12354. doi: 10.1038/s41598-020-69433-w.

Big data in digital healthcare: lessons learnt and recommendations for general practice.

Heredity (Edinb). 2020 Apr;124(4):525-534. doi: 10.1038/s41437-020-0303-2. Epub 2020 Mar 5.

Artificial Intelligence and Machine Learning in Pathology: The Present Landscape of Supervised Methods.

Acad Pathol. 2019 Sep 3;6:2374289519873088. doi: 10.1177/2374289519873088. eCollection 2019 Jan-Dec.

The medical AI insurgency: what physicians must know about data to practice with intelligent machines.

NPJ Digit Med. 2019 Jun 28;2:62. doi: 10.1038/s41746-019-0138-5. eCollection 2019.

Artificial intelligence and machine learning in clinical development: a translational perspective.

NPJ Digit Med. 2019 Jul 26;2:69. doi: 10.1038/s41746-019-0148-3. eCollection 2019.

De novo protein design by citizen scientists.

Nature. 2019 Jun;570(7761):390-394. doi: 10.1038/s41586-019-1274-4. Epub 2019 Jun 5.

Simulation-assisted machine learning.

Bioinformatics. 2019 Oct 15;35(20):4072-4080. doi: 10.1093/bioinformatics/btz199.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用自动机器学习平台对基于合成数据训练的模型进行肺结核预测。

Prediction of Tuberculosis Using an Automated Machine Learning Platform for Models Trained on Synthetic Data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献