Pérez-Gómez Eloy, Gómez José, Gonzalo Jennifer, Salgüero Sergio, Riado Daniel, Casas María Luisa, Gutiérrez María Luisa, Jaime Elena, Pérez-Martínez Enrique, García-Carretero Rafael, Ramos Javier, Fernández-Rodríguez Conrado, Catalá Myriam, Martino Luca, Barquero-Pérez Óscar
Department of Signal Theory and Communications, EIF, University Rey Juan Carlos, Fuenlabrada, Spain.
Department of Biology and Geology, Physics and Inorganic Chemistry, ESCET, University Rey Juan Carlos, Móstoles, Spain.
Front Med (Lausanne). 2025 Jun 9;12:1596476. doi: 10.3389/fmed.2025.1596476. eCollection 2025.
Managing chronic viral infections like Hepatitis C virus (HCV) often requires expensive healthcare resources and highly qualified personnel, making efficient diagnostic methods essential. Despite remarkable therapeutic advancements for the treatment of HCV, several challenges remain, such as improved fast diagnostic procedures allowing universal screening.
We propose a novel approach that combines Near-Infrared Spectroscopy (NIRS) and clinical data with machine learning (ML) to improve Hepatitis C Virus (HCV) detection in serum samples.
NIRS offers a fast, non-destructive, and residue-free alternative to traditional diagnostic methods, while ML models enable feature selection and predictive analysis. We applied L1-regularized Logistic Regression (L1-LR) to identify the most informative wavelengths for HCV detection within the 1,000-2,500 nm range, and then integrated these spectral features with routine clinical markers using a Random Forest (RF) model. Our dataset comprised 137 serum samples from 38 patients, each represented by a NIRS spectrum and clinical data from blood tests.
After preprocessing with Standard Normal Variate (SNV) correction and downsampling, the best-performing RF model, which combined NIRS features and clinical data, achieved an accuracy of 72.2% and an AUC-ROC of 0.850, outperforming models using only clinical or spectral data. Feature importance analysis highlighted specific wavelengths near 1,150 nm, 1,410 nm, and 1,927 nm, associated with water molecular states and liver function biomarkers (GPT, GOT, GGT), reinforcing the biological relevance of this approach.
These findings suggest that integrating NIRS and clinical data through machine learning enhances HCV diagnostic capabilities, offering a scalable and non-invasive alternative for early detection and risk assessment.
管理诸如丙型肝炎病毒(HCV)等慢性病毒感染通常需要昂贵的医疗资源和高素质的人员,因此高效的诊断方法至关重要。尽管在丙型肝炎病毒治疗方面取得了显著的治疗进展,但仍存在一些挑战,例如改进快速诊断程序以实现普遍筛查。
我们提出一种将近红外光谱(NIRS)和临床数据与机器学习(ML)相结合的新方法,以改善血清样本中丙型肝炎病毒(HCV)的检测。
近红外光谱为传统诊断方法提供了一种快速、无损且无残留的替代方法,而机器学习模型能够进行特征选择和预测分析。我们应用L1正则化逻辑回归(L1-LR)来识别1000-2500nm范围内用于丙型肝炎病毒检测的最具信息性的波长,然后使用随机森林(RF)模型将这些光谱特征与常规临床标志物相结合。我们的数据集包括来自38名患者的137份血清样本,每个样本由近红外光谱和血液检测的临床数据表示。
经过标准正态变量(SNV)校正和下采样预处理后,结合近红外光谱特征和临床数据的表现最佳的随机森林模型的准确率达到72.2%,曲线下面积(AUC-ROC)为0.850,优于仅使用临床或光谱数据的模型。特征重要性分析突出了1150nm、1410nm和1927nm附近的特定波长,这些波长与水分子状态和肝功能生物标志物(谷丙转氨酶、谷草转氨酶、γ-谷氨酰转肽酶)相关,强化了该方法的生物学相关性。
这些发现表明,通过机器学习整合近红外光谱和临床数据可增强丙型肝炎病毒的诊断能力,为早期检测和风险评估提供一种可扩展的非侵入性替代方法。