Rohanian Omid, Kouchaki Samaneh, Soltan Andrew, Yang Jenny, Rohanian Morteza, Yang Yang, Clifton David
IEEE J Biomed Health Inform. 2023 Mar;27(3):1249-1258. doi: 10.1109/JBHI.2022.3230663. Epub 2023 Mar 7.
Early detection of COVID-19 is an ongoing area of research that can help with triage, monitoring and general health assessment of potential patients and may reduce operational strain on hospitals that cope with the coronavirus pandemic. Different machine learning techniques have been used in the literature to detect potential cases of coronavirus using routine clinical data (blood tests, and vital signs measurements). Data breaches and information leakage when using these models can bring reputational damage and cause legal issues for hospitals. In spite of this, protecting healthcare models against leakage of potentially sensitive information is an understudied research area. In this study, two machine learning techniques that aim to predict a patient's COVID-19 status are examined. Using adversarial training, robust deep learning architectures are explored with the aim to protect attributes related to demographic information about the patients. The two models examined in this work are intended to preserve sensitive information against adversarial attacks and information leakage. In a series of experiments using datasets from the Oxford University Hospitals (OUH), Bedfordshire Hospitals NHS Foundation Trust (BH), University Hospitals Birmingham NHS Foundation Trust (UHB), and Portsmouth Hospitals University NHS Trust (PUH), two neural networks are trained and evaluated. These networks predict PCR test results using information from basic laboratory blood tests, and vital signs collected from a patient upon arrival to the hospital. The level of privacy each one of the models can provide is assessed and the efficacy and robustness of the proposed architectures are compared with a relevant baseline. One of the main contributions in this work is the particular focus on the development of effective COVID-19 detection models with built-in mechanisms in order to selectively protect sensitive attributes against adversarial attacks. The results on hold-out test set and external validation confirmed that there was no impact on the generalisibility of the model using adversarial learning.
新型冠状病毒肺炎(COVID-19)的早期检测是一个正在进行的研究领域,有助于对潜在患者进行分流、监测和总体健康评估,并可能减轻应对冠状病毒大流行的医院的运营压力。文献中已使用不同的机器学习技术,利用常规临床数据(血液检测和生命体征测量)来检测潜在的冠状病毒病例。使用这些模型时的数据泄露和信息泄漏可能会给医院带来声誉损害并引发法律问题。尽管如此,保护医疗保健模型免受潜在敏感信息泄漏的影响仍是一个研究不足的领域。在本研究中,对旨在预测患者COVID-19状态的两种机器学习技术进行了研究。通过对抗训练,探索了强大的深度学习架构,旨在保护与患者人口统计信息相关的属性。本研究中考察的两个模型旨在保护敏感信息免受对抗攻击和信息泄漏。在一系列使用来自牛津大学医院(OUH)、贝德福德郡医院国民保健服务基金会信托(BH)、伯明翰大学医院国民保健服务基金会信托(UHB)和朴茨茅斯大学医院国民保健服务信托(PUH)数据集的实验中,训练并评估了两个神经网络。这些网络利用基本实验室血液检测信息和患者入院时收集的生命体征来预测聚合酶链反应(PCR)检测结果。评估了每个模型能够提供的隐私级别,并将所提出架构的有效性和鲁棒性与相关基线进行了比较。这项工作的主要贡献之一是特别关注开发具有内置机制的有效COVID-19检测模型,以便有选择地保护敏感属性免受对抗攻击。在保留测试集和外部验证上的结果证实,使用对抗学习对模型的泛化能力没有影响。