Suppr超能文献

使用自动机器学习平台对基于合成数据训练的模型进行肺结核预测。

Prediction of Tuberculosis Using an Automated Machine Learning Platform for Models Trained on Synthetic Data.

作者信息

Rashidi Hooman H, Khan Imran H, Dang Luke T, Albahra Samer, Ratan Ujjwal, Chadderwala Nihir, To Wilson, Srinivas Prathima, Wajda Jeffery, Tran Nam K

机构信息

Department of Pathology and Laboratory Medicine, University of California, Davis, School of Medicine, Sacramento, California, United States of America.

Amazon Web Services, Seattle, Washington, United States of America.

出版信息

J Pathol Inform. 2022 Jan 20;13:10. doi: 10.4103/jpi.jpi_75_21. eCollection 2022.

Abstract

High-quality medical data is critical to the development and implementation of machine learning (ML) algorithms in healthcare; however, security, and privacy concerns continue to limit access. We sought to determine the utility of "synthetic data" in training ML algorithms for the detection of tuberculosis (TB) from inflammatory biomarker profiles. A retrospective dataset (A) comprised of 278 patients was used to generate synthetic datasets (B, C, and D) for training models prior to secondary validation on a generalization dataset. ML models trained and validated on the Dataset A (real) demonstrated an accuracy of 90%, a sensitivity of 89% (95% CI, 83-94%), and a specificity of 100% (95% CI, 81-100%). Models trained using the optimal synthetic dataset B showed an accuracy of 91%, a sensitivity of 93% (95% CI, 87-96%), and a specificity of 77% (95% CI, 50-93%). Synthetic datasets C and D displayed diminished performance measures (respective accuracies of 71% and 54%). This pilot study highlights the promise of synthetic data as an expedited means for ML algorithm development.

摘要

高质量的医学数据对于医疗保健领域机器学习(ML)算法的开发和实施至关重要;然而,安全和隐私问题继续限制数据的获取。我们试图确定“合成数据”在训练用于从炎症生物标志物谱中检测结核病(TB)的ML算法中的效用。一个由278名患者组成的回顾性数据集(A)被用于生成合成数据集(B、C和D),以便在泛化数据集上进行二次验证之前训练模型。在数据集A(真实数据)上训练和验证的ML模型显示准确率为90%,灵敏度为89%(95%CI,83 - 94%),特异性为100%(95%CI,81 - 100%)。使用最优合成数据集B训练的模型显示准确率为91%,灵敏度为93%(95%CI,87 - 96%),特异性为77%(95%CI,50 - 93%)。合成数据集C和D的性能指标有所下降(各自的准确率为71%和54%)。这项初步研究突出了合成数据作为ML算法开发的一种快速手段的前景。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a17d/8794034/139d1b6da3ae/gr1.jpg

相似文献

1
Prediction of Tuberculosis Using an Automated Machine Learning Platform for Models Trained on Synthetic Data.
J Pathol Inform. 2022 Jan 20;13:10. doi: 10.4103/jpi.jpi_75_21. eCollection 2022.
5
6
Privacy preserving Generative Adversarial Networks to model Electronic Health Records.
Neural Netw. 2022 Sep;153:339-348. doi: 10.1016/j.neunet.2022.06.022. Epub 2022 Jun 25.
10
OCT-based deep learning algorithm for the evaluation of treatment indication with anti-vascular endothelial growth factor medications.
Graefes Arch Clin Exp Ophthalmol. 2018 Jan;256(1):91-98. doi: 10.1007/s00417-017-3839-y. Epub 2017 Nov 10.

本文引用的文献

3
Current Challenges and Barriers to Real-World Artificial Intelligence Adoption for the Healthcare System, Provider, and the Patient.
Transl Vis Sci Technol. 2020 Aug 11;9(2):45. doi: 10.1167/tvst.9.2.45. eCollection 2020 Aug.
5
Big data in digital healthcare: lessons learnt and recommendations for general practice.
Heredity (Edinb). 2020 Apr;124(4):525-534. doi: 10.1038/s41437-020-0303-2. Epub 2020 Mar 5.
6
Artificial Intelligence and Machine Learning in Pathology: The Present Landscape of Supervised Methods.
Acad Pathol. 2019 Sep 3;6:2374289519873088. doi: 10.1177/2374289519873088. eCollection 2019 Jan-Dec.
7
The medical AI insurgency: what physicians must know about data to practice with intelligent machines.
NPJ Digit Med. 2019 Jun 28;2:62. doi: 10.1038/s41746-019-0138-5. eCollection 2019.
8
Artificial intelligence and machine learning in clinical development: a translational perspective.
NPJ Digit Med. 2019 Jul 26;2:69. doi: 10.1038/s41746-019-0148-3. eCollection 2019.
9
De novo protein design by citizen scientists.
Nature. 2019 Jun;570(7761):390-394. doi: 10.1038/s41586-019-1274-4. Epub 2019 Jun 5.
10
Simulation-assisted machine learning.
Bioinformatics. 2019 Oct 15;35(20):4072-4080. doi: 10.1093/bioinformatics/btz199.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验