Ma Jun, Yin Hongyun, Hao Xiaohui, Sha Wei, Cui Haiyan
Department of Tuberculosis and Shanghai Key Lab of Tuberculosis, Shanghai Pulmonary Hospital Affiliated to Tongji University School of Medicine 507 Zhengmin Road, Shanghai 200433, China.
Clinic and Research Center of Tuberculosis, Shanghai Key Lab of Tuberculosis, Shanghai Pulmonary Hospital, Tongji University School of Medicine 507 Zhengmin Road, Shanghai 200433, China.
Am J Transl Res. 2021 Jun 15;13(6):6166-6174. eCollection 2021.
To identify significant diagnostic factors and establish a predictive model for diagnosis of sarcoidosis and tuberculosis.
This study included 252 patients (123 cases of lung sarcoidosis and 129 cases of lung tuberculosis) who underwent laboratory evaluation, including routine hematologic testing, serum immunology, blood coagulation, angiotensin-converting enzyme, and T lymphocyte subset. The factors that statistically different between the two groups were identified by an independent sample t test first, and then processed by the random forest model to distinguish two diseases with the classification function. Moreover, the diagnostic performance of the predictive random forest model was evaluated through the identification of individual contribution of various diagnostic factors conducted by the model.
The random forest model revealed a classification error rate of 24.9%. Among all of the statistically significant diagnostic factors, the individual factors with the greatest and second contribution were angiotensin-converting enzyme and prothrombin time, respectively. The area under the receiver operating characteristic (ROC) curve of the random forest prediction model was 0.915.
The random forest model can be used to distinguish between sarcoidosis and tuberculosis by incorporating statistically significant diagnostic factors, which is of potential clinical application value.
确定结节病和结核病诊断的重要因素并建立预测模型。
本研究纳入252例患者(123例肺结节病和129例肺结核),这些患者均接受了实验室评估,包括血常规、血清免疫学、凝血、血管紧张素转换酶及T淋巴细胞亚群检测。首先通过独立样本t检验确定两组间存在统计学差异的因素,然后用随机森林模型处理以利用其分类功能区分两种疾病。此外,通过模型对各种诊断因素个体贡献的识别来评估预测随机森林模型的诊断性能。
随机森林模型显示分类错误率为24.9%。在所有具有统计学意义的诊断因素中,贡献最大和第二大的个体因素分别是血管紧张素转换酶和凝血酶原时间。随机森林预测模型的受试者工作特征(ROC)曲线下面积为0.915。
随机森林模型可通过纳入具有统计学意义的诊断因素来区分结节病和结核病,具有潜在的临床应用价值。