Murdaca Giuseppe, Caprioli Simone, Tonacci Alessandro, Billeci Lucia, Greco Monica, Negrini Simone, Cittadini Giuseppe, Zentilin Patrizia, Ventura Spagnolo Elvira, Gangemi Sebastiano
Department of Internal Medicine, Scleroderma Unit, Clinical Immunology Unit, University of Genoa, 16143 Genoa, Italy.
Radiology Unit, IRCCS Policlinico San Martino, 16132 Genoa, Italy.
Diagnostics (Basel). 2021 Oct 12;11(10):1880. doi: 10.3390/diagnostics11101880.
Systemic sclerosis (SSc) is a systemic immune-mediated disease, featuring fibrosis of the skin and organs, and has the greatest mortality among rheumatic diseases. The nervous system involvement has recently been demonstrated, although actual lung involvement is considered the leading cause of death in SSc and, therefore, should be diagnosed early. Pulmonary function tests are not sensitive enough to be used for screening purposes, thus they should be flanked by other clinical examinations; however, this would lead to a risk of overtesting, with considerable costs for the health system and an unnecessary burden for the patients. To this extent, Machine Learning (ML) algorithms could represent a useful add-on to the current clinical practice for diagnostic purposes and could help retrieve the most useful exams to be carried out for diagnostic purposes.
Here, we retrospectively collected high resolution computed tomography, pulmonary function tests, esophageal pH impedance tests, esophageal manometry and reflux disease questionnaires of 38 patients with SSc, applying, with R, different supervised ML algorithms, including lasso, ridge, elastic net, classification and regression trees (CART) and random forest to estimate the most important predictors for pulmonary involvement from such data.
In terms of performance, the random forest algorithm outperformed the other classifiers, with an estimated root-mean-square error (RMSE) of 0.810. However, this algorithm was seen to be computationally intensive, leaving room for the usefulness of other classifiers when a shorter response time is needed.
Despite the notably small sample size, that could have prevented obtaining fully reliable data, the powerful tools available for ML can be useful for predicting early lung involvement in SSc patients. The use of predictors coming from spirometry and pH impedentiometry together might perform optimally for predicting early lung involvement in SSc.
系统性硬化症(SSc)是一种系统性免疫介导的疾病,其特征为皮肤和器官纤维化,在风湿性疾病中死亡率最高。尽管实际的肺部受累被认为是SSc患者的主要死因,因此应早期诊断,但近期已证实神经系统也会受累。肺功能测试用于筛查时不够敏感,因此需要辅以其他临床检查;然而,这会导致过度检查的风险,给卫生系统带来相当大的成本,并给患者带来不必要的负担。在这种情况下,机器学习(ML)算法可作为当前临床实践中诊断的有用补充,并有助于找出诊断所需的最有用检查。
在此,我们回顾性收集了38例SSc患者的高分辨率计算机断层扫描、肺功能测试、食管pH阻抗测试、食管测压和反流疾病问卷,使用R语言应用不同的监督ML算法,包括套索回归、岭回归、弹性网络、分类与回归树(CART)和随机森林,从这些数据中估计肺部受累的最重要预测因素。
在性能方面,随机森林算法优于其他分类器,估计均方根误差(RMSE)为0.810。然而,该算法计算量较大,在需要较短响应时间时,其他分类器仍有用处。
尽管样本量明显较小,可能无法获得完全可靠的数据,但ML可用的强大工具对于预测SSc患者的早期肺部受累可能有用。将肺活量测定和pH阻抗测定得出的预测因素一起使用,可能对预测SSc患者的早期肺部受累效果最佳。