Huang Ruixuan, Liu Jundong, Wan Tsz Kin, Siriwanna Damrongrat, Woo Yat Ming Peter, Vodencarevic Asmir, Wong Chi Wah, Chan Kei Hang Katie
Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China.
Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China.
Comput Biol Med. 2023 Mar;155:106176. doi: 10.1016/j.compbiomed.2022.106176. Epub 2022 Oct 28.
For severe cerebrovascular diseases such as stroke, the prediction of short-term mortality of patients has tremendous medical significance. In this study, we combined machine learning models Random Forest classifier (RF), Adaptive Boosting (AdaBoost), Extremely Randomised Trees (ExtraTree) classifier, XGBoost classifier, TabNet, and DistilBERT to construct a multi-level prediction model that used bioassay data and radiology text reports from haemorrhagic and ischaemic stroke patients to predict six-month mortality. The performances of the prediction models were measured using the area under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve (AUPRC), precision, recall, and F1-score. The prediction models were built with the use of data from 19,616 haemorrhagic stroke patients and 50,178 ischaemic stroke patients. Novel six-month mortality prediction models for these patients were developed, which enhanced the performance of the prediction models by combining laboratory test data, structured data, and textual radiology report data. The achieved performances were as follows: AUROC = 0.89, AUPRC = 0.70, precision = 0.52, recall = 0.78, and F1 score = 0.63 for haemorrhagic patients, and AUROC = 0.88, AUPRC = 0.54, precision = 0.34, recall = 0.80, and F1 score = 0.48 for ischaemic patients. Such models could be used for mortality risk assessment and early identification of high-risk stroke patients. This could contribute to more efficient utilisation of healthcare resources for stroke survivors.
对于中风等严重脑血管疾病,预测患者的短期死亡率具有重大医学意义。在本研究中,我们结合了机器学习模型随机森林分类器(RF)、自适应增强(AdaBoost)、极端随机树(ExtraTree)分类器、XGBoost分类器、TabNet和DistilBERT,构建了一个多层次预测模型,该模型利用出血性和缺血性中风患者的生物检测数据和放射学文本报告来预测六个月死亡率。使用受试者工作特征曲线下面积(AUROC)、精确召回率曲线下面积(AUPRC)、精确率、召回率和F1分数来衡量预测模型的性能。预测模型是利用19616例出血性中风患者和50178例缺血性中风患者的数据构建的。针对这些患者开发了新的六个月死亡率预测模型,该模型通过结合实验室检测数据、结构化数据和放射学文本报告数据提高了预测模型的性能。所取得的性能如下:出血性患者的AUROC = 0.89,AUPRC = 0.70,精确率 = 0.52,召回率 = 0.78,F1分数 = 0.63;缺血性患者的AUROC = 0.88,AUPRC = 0.54,精确率 = 0.34,召回率 = 0.80,F1分数 = 0.48。此类模型可用于死亡率风险评估和早期识别高危中风患者。这有助于更有效地利用中风幸存者的医疗资源。