Department of Computer ScienceSwansea UniversitySwanseaSA1 8ENU.K.
Institute of Life Science, Swansea UniversitySwanseaSA1 8ENU.K.
IEEE J Transl Eng Health Med. 2020 Nov 24;9:3000113. doi: 10.1109/JTEHM.2020.3040236. eCollection 2021.
A growing elderly population suffering from incurable, chronic conditions such as dementia present a continual strain on medical services due to mental impairment paired with high comorbidity resulting in increased hospitalization risk. The identification of at risk individuals allows for preventative measures to alleviate said strain. Electronic health records provide opportunity for big data analysis to address such applications. Such data however, provides a challenging problem space for traditional statistics and machine learning due to high dimensionality and sparse data elements. This article proposes a novel machine learning methodology: entropy regularization with ensemble deep neural networks (ECNN), which simultaneously provides high predictive performance of hospitalization of patients with dementia whilst enabling an interpretable heuristic analysis of the model architecture, able to identify individual features of importance within a large feature domain space. Experimental results on health records containing 54,647 features were able to identify 10 event indicators within a patient timeline: a collection of diagnostic events, medication prescriptions and procedural events, the highest ranked being essential hypertension. The resulting subset was still able to provide a highly competitive hospitalization prediction (Accuracy: 0.759) as compared to the full feature domain (Accuracy: 0.755) or traditional feature selection techniques (Accuracy: 0.737), a significant reduction in feature size. The discovery and heuristic evidence of correlation provide evidence for further clinical study of said medical events as potential novel indicators. There also remains great potential for adaption of ECNN within other medical big data domains as a data mining tool for novel risk factor identification.
不断增长的老年人口患有无法治愈的慢性疾病,如痴呆症,由于精神障碍与高合并症并存,导致住院风险增加,这给医疗服务带来了持续的压力。识别高危人群可以采取预防措施来减轻这种压力。电子健康记录为大数据分析提供了机会,以解决此类应用问题。然而,由于高维性和稀疏的数据元素,这种数据为传统的统计和机器学习提供了具有挑战性的问题空间。本文提出了一种新的机器学习方法:基于集成深度神经网络的熵正则化(ECNN),该方法同时提供了痴呆症患者住院的高预测性能,同时能够对模型结构进行可解释的启发式分析,从而能够识别出大特征域空间中的重要单个特征。在包含 54647 个特征的健康记录上进行的实验结果能够在患者时间线上识别出 10 个事件指标:一组诊断事件、药物处方和程序事件,排名最高的是原发性高血压。与完整特征域(Accuracy: 0.755)或传统特征选择技术(Accuracy: 0.737)相比,所得到的子集仍然能够提供高度竞争的住院预测(Accuracy: 0.759),这大大减少了特征的大小。相关性的发现和启发式证据为进一步研究这些医疗事件作为潜在的新指标提供了证据。ECNN 在其他医疗大数据领域作为一种新的风险因素识别数据挖掘工具,也具有很大的适应性潜力。