Unità Operativa Centro Controllo Qualità e Rischio Chimico (CQRC), Azienda Ospedaliera Villa Sofia Cervello, Palermo, Italy.
Department of Medicine and Rehabilitation, Policlinico di Monza, Monza, Italy; Department of Medicine and Surgery, School of Medicine, University of Milano-Bicocca, Monza, Italy.
Int J Med Inform. 2023 Feb;170:104932. doi: 10.1016/j.ijmedinf.2022.104932. Epub 2022 Nov 25.
The progress of digital transformation in clinical practice opens the door to transforming the current clinical line for liver disease diagnosis from a late-stage diagnosis approach to an early-stage based one. Early diagnosis of liver fibrosis can prevent the progression of the disease and decrease liver-related morbidity and mortality. We developed here a machine learning (ML) algorithm containing standard parameters that can identify liver fibrosis in the general US population.
Starting from a public database (National Health and Nutrition Examination Survey, NHANES), representative of the American population with 7265 eligible subjects (control population n = 6828, with Fibroscan values E < 9.7 KPa; target population n = 437 with Fibroscan values E ≥ 9.7 KPa), we set up an SVM algorithm able to discriminate for individuals with liver fibrosis among the general US population. The algorithm set up involved the removal of missing data and a sampling optimization step to managing the data imbalance (only ∼ 5 % of the dataset is the target population).
For the feature selection, we performed an unbiased analysis, starting from 33 clinical, anthropometric, and biochemical parameters regardless of their previous application as biomarkers of liver diseases. Through PCA analysis, we identified the 26 more significant features and then used them to set up a sampling method on an SVM algorithm. The best sampling technique to manage the data imbalance was found to be oversampling through the SMOTE-NC. For final model validation, we utilized a subset of 300 individuals (150 with liver fibrosis and 150 controls), subtracted from the main dataset prior to sampling. Performances were evaluated on multiple independent runs.
We provide proof of concept of an ML clinical decision support tool for liver fibrosis diagnosis in the general US population. Though the presented ML model represents at this stage only a prototype, in the future, it might be implemented and potentially applied to program broad screenings for liver fibrosis.
临床实践中数字化转型的进展为将当前肝病诊断的临床路径从晚期诊断方法转变为早期诊断方法打开了大门。早期诊断肝纤维化可以防止疾病进展,降低与肝脏相关的发病率和死亡率。我们在这里开发了一种包含标准参数的机器学习 (ML) 算法,可以识别美国普通人群中的肝纤维化。
从一个公共数据库(国家健康和营养检查调查,NHANES)开始,该数据库代表了具有 7265 名合格受试者的美国人群(对照组 n = 6828 人,Fibroscan 值 E < 9.7 kPa;目标人群 n = 437 人,Fibroscan 值 E ≥ 9.7 kPa),我们建立了一个能够在普通美国人群中识别肝纤维化个体的 SVM 算法。所建立的算法涉及到去除缺失数据和采样优化步骤,以管理数据不平衡(数据集只有约 5%是目标人群)。
对于特征选择,我们进行了无偏分析,从 33 个临床、人体测量和生化参数开始,无论它们之前是否作为肝病生物标志物应用。通过 PCA 分析,我们确定了 26 个更重要的特征,然后使用它们在 SVM 算法上建立了一种采样方法。发现用于管理数据不平衡的最佳采样技术是通过 SMOTE-NC 进行过采样。为了进行最终模型验证,我们使用了从主数据集中减去的 300 名个体(150 名有肝纤维化,150 名对照)的子集。在多次独立运行中评估性能。
我们提供了一个用于美国普通人群肝纤维化诊断的 ML 临床决策支持工具的概念验证。虽然提出的 ML 模型在现阶段仅代表一个原型,但它可能在未来被实施并可能应用于广泛的肝纤维化筛查程序。