Wang Haohan, Wu Zhenglin, Xing Eric P
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA,
Pac Symp Biocomput. 2019;24:54-65.
The proliferation of healthcare data has brought the opportunities of applying data-driven approaches, such as machine learning methods, to assist diagnosis. Recently, many deep learning methods have been shown with impressive successes in predicting disease status with raw input data. However, the "black-box" nature of deep learning and the highreliability requirement of biomedical applications have created new challenges regarding the existence of confounding factors. In this paper, with a brief argument that inappropriate handling of confounding factors will lead to models' sub-optimal performance in real-world applications, we present an efficient method that can remove the inuences of confounding factors such as age or gender to improve the across-cohort prediction accuracy of neural networks. One distinct advantage of our method is that it only requires minimal changes of the baseline model's architecture so that it can be plugged into most of the existing neural networks. We conduct experiments across CT-scan, MRA, and EEG brain wave with convolutional neural networks and LSTM to verify the efficiency of our method.
医疗保健数据的激增带来了应用数据驱动方法(如机器学习方法)辅助诊断的机遇。最近,许多深度学习方法在利用原始输入数据预测疾病状态方面取得了令人瞩目的成功。然而,深度学习的“黑箱”性质以及生物医学应用对高可靠性的要求,给混杂因素的存在带来了新的挑战。在本文中,我们简要论证了对混杂因素处理不当会导致模型在实际应用中表现欠佳,进而提出了一种有效方法,该方法可以消除年龄或性别等混杂因素的影响,以提高神经网络在跨队列预测中的准确性。我们方法的一个显著优势是,它只需要对基线模型的架构进行最小程度的更改,从而能够插入到大多数现有的神经网络中。我们使用卷积神经网络和长短期记忆网络(LSTM)对CT扫描、磁共振血管造影(MRA)和脑电图脑电波进行了实验,以验证我们方法的有效性。