Biomedical Informatics Centre, Indian Council of Medical Research-National Institute for Research in Reproductive Health, Mumbai, 400012, India.
Centre for Global Health Research, St. Michael's Hospital, Unity Health Toronto, and Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.
BMC Public Health. 2021 Oct 4;21(1):1787. doi: 10.1186/s12889-021-11829-y.
Machine learning (ML) algorithms have been successfully employed for prediction of outcomes in clinical research. In this study, we have explored the application of ML-based algorithms to predict cause of death (CoD) from verbal autopsy records available through the Million Death Study (MDS).
From MDS, 18826 unique childhood deaths at ages 1-59 months during the time period 2004-13 were selected for generating the prediction models of which over 70% of deaths were caused by six infectious diseases (pneumonia, diarrhoeal diseases, malaria, fever of unknown origin, meningitis/encephalitis, and measles). Six popular ML-based algorithms such as support vector machine, gradient boosting modeling, C5.0, artificial neural network, k-nearest neighbor, classification and regression tree were used for building the CoD prediction models.
SVM algorithm was the best performer with a prediction accuracy of over 0.8. The highest accuracy was found for diarrhoeal diseases (accuracy = 0.97) and the lowest was for meningitis/encephalitis (accuracy = 0.80). The top signs/symptoms for classification of these CoDs were also extracted for each of the diseases. A combination of signs/symptoms presented by the deceased individual can effectively lead to the CoD diagnosis.
Overall, this study affirms that verbal autopsy tools are efficient in CoD diagnosis and that automated classification parameters captured through ML could be added to verbal autopsies to improve classification of causes of death.
机器学习(ML)算法已成功应用于临床研究结果预测。在这项研究中,我们探索了将基于 ML 的算法应用于通过百万死亡研究(MDS)获得的死因推断记录来预测死因(CoD)。
从 MDS 中,选择了 2004-13 年期间年龄在 1-59 个月的 18826 例独特的儿童死亡记录,用于生成预测模型,其中超过 70%的死亡是由六种传染病(肺炎、腹泻病、疟疾、原因不明发热、脑膜炎/脑炎和麻疹)引起的。使用了六种流行的基于 ML 的算法,如支持向量机、梯度提升建模、C5.0、人工神经网络、k-最近邻、分类和回归树,用于构建 CoD 预测模型。
SVM 算法表现最佳,预测准确率超过 0.8。腹泻病的准确率最高(准确率=0.97),脑膜炎/脑炎的准确率最低(准确率=0.80)。还为每种疾病提取了用于分类这些 CoD 的主要症状/体征。死者个体呈现的症状/体征组合可有效导致 CoD 诊断。
总的来说,这项研究证实了死因推断工具在 CoD 诊断方面的有效性,并且可以通过 ML 自动捕获的分类参数来改进死因分类。