Scuola Normale Superiore, Pisa, Italy.
Institute of Clinical Physiology, C.N.R, Pisa, Italy.
PLoS One. 2022 May 19;17(5):e0268327. doi: 10.1371/journal.pone.0268327. eCollection 2022.
We present a workflow for clinical data analysis that relies on Bayesian Structure Learning (BSL), an unsupervised learning approach, robust to noise and biases, that allows to incorporate prior medical knowledge into the learning process and that provides explainable results in the form of a graph showing the causal connections among the analyzed features. The workflow consists in a multi-step approach that goes from identifying the main causes of patient's outcome through BSL, to the realization of a tool suitable for clinical practice, based on a Binary Decision Tree (BDT), to recognize patients at high-risk with information available already at hospital admission time. We evaluate our approach on a feature-rich dataset of Coronavirus disease (COVID-19), showing that the proposed framework provides a schematic overview of the multi-factorial processes that jointly contribute to the outcome. We compare our findings with current literature on COVID-19, showing that this approach allows to re-discover established cause-effect relationships about the disease. Further, our approach yields to a highly interpretable tool correctly predicting the outcome of 85% of subjects based exclusively on 3 features: age, a previous history of chronic obstructive pulmonary disease and the PaO2/FiO2 ratio at the time of arrival to the hospital. The inclusion of additional information from 4 routine blood tests (Creatinine, Glucose, pO2 and Sodium) increases predictive accuracy to 94.5%.
我们提出了一种临床数据分析工作流程,该工作流程依赖于贝叶斯结构学习(BSL),这是一种无监督学习方法,能够抵抗噪声和偏差,允许将先前的医学知识纳入学习过程,并以显示分析特征之间因果关系的图形形式提供可解释的结果。该工作流程由多个步骤组成,从通过 BSL 确定患者结果的主要原因开始,到基于二叉决策树(BDT)实现适合临床实践的工具,以识别在入院时已有信息的高风险患者。我们在富含特征的冠状病毒病(COVID-19)数据集上评估了我们的方法,表明所提出的框架提供了对共同导致结果的多因素过程的示意性概述。我们将我们的发现与 COVID-19 的当前文献进行了比较,表明这种方法允许重新发现有关该疾病的既定因果关系。此外,我们的方法产生了一个高度可解释的工具,可以仅基于 3 个特征正确预测 85%的受试者的结果:年龄、慢性阻塞性肺疾病病史和到达医院时的 PaO2/FiO2 比值。从 4 项常规血液检查(肌酐、葡萄糖、pO2 和钠)中包含更多信息可将预测准确性提高到 94.5%。