基于信息不平衡的最大信息量特征选择：在 COVID-19 严重程度预测中的应用。

Maximally informative feature selection using Information Imbalance: Application to COVID-19 severity prediction.

机构信息

International School for Advanced Studies (SISSA), Via Bonomea 265, 34136, Trieste, Italy.

Infectious Disease Unit, Azienda Sanitaria Universitaria Friuli Centrale (ASU FC), Via Pozzuolo 330, 33100, Udine, Italy.

出版信息

Sci Rep. 2024 May 10;14(1):10744. doi: 10.1038/s41598-024-61334-6.

DOI:10.1038/s41598-024-61334-6

PMID:38730063

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11087653/

Abstract

Clinical databases typically include, for each patient, many heterogeneous features, for example blood exams, the clinical history before the onset of the disease, the evolution of the symptoms, the results of imaging exams, and many others. We here propose to exploit a recently developed statistical approach, the Information Imbalance, to compare different subsets of patient features and automatically select the set of features that is maximally informative for a given clinical purpose, especially in minority classes. We adapt the Information Imbalance approach to work in a clinical framework, where patient features are often categorical and are generally available only for a fraction of the patients. We apply this algorithm to a data set of 1300 patients treated for COVID-19 in Udine hospital before October 2021. Using this approach, we find combinations of features which, if used in combination, are maximally informative of the clinical fate and of the severity of the disease. The optimal number of features, which is determined automatically, turns out to be between 10 and 15. These features can be measured at admission. The approach can be used also if the features are available only for a fraction of the patients, does not require imputation and, importantly, is able to automatically select features with small inter-feature correlation. Clinical insights deriving from this study are also discussed.

摘要

临床数据库通常包含每个患者的许多异质特征，例如血液检查、疾病发作前的临床病史、症状的演变、影像学检查结果等。在这里，我们建议利用最近开发的统计方法——信息不平衡，来比较患者特征的不同子集，并自动选择对特定临床目的最具信息量的特征集，尤其是在少数类中。我们将信息不平衡方法应用于一个临床框架中，其中患者特征通常是分类的，并且仅在患者的一部分中可用。我们将该算法应用于 2021 年 10 月之前在乌迪内医院治疗的 1300 名 COVID-19 患者的数据集中。通过使用这种方法，我们找到了特征的组合，如果组合使用，这些组合可以最大程度地反映临床结果和疾病的严重程度。最佳特征数量是自动确定的，结果介于 10 到 15 个之间。这些特征可以在入院时测量。如果特征仅在患者的一部分中可用，该方法也可以使用，不需要插补，并且重要的是，能够自动选择特征之间相关性较小的特征。还讨论了从这项研究中得出的临床见解。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于信息不平衡的最大信息量特征选择：在 COVID-19 严重程度预测中的应用。

Maximally informative feature selection using Information Imbalance: Application to COVID-19 severity prediction.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

基于信息不平衡的最大信息量特征选择：在 COVID-19 严重程度预测中的应用。

Maximally informative feature selection using Information Imbalance: Application to COVID-19 severity prediction.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献