Assaf Rabih, Rammal Abbas, Goupil Alban, Kacim Mohammad, Vrabie Valeriu
Faculty of Arts and Sciences, Department of Mathematics, Holy Spirit University of Kaslik, Jounieh, Lebanon.
Faculty of Arts and Sciences, Mathematics and Computer Sciences Department, Lebanese American University, Beirut, Lebanon.
BMC Biomed Eng. 2025 Apr 2;7(1):4. doi: 10.1186/s42490-025-00089-1.
COVID-19 has claimed the lives of thousands over the past years. Although pathogenic laboratory testing is the established standard, it carries a significant drawback with a notable rate of false negatives. Consequently, there is an urgent need for alternative diagnostic approaches to combat this threat. In response to this pressing need for accurate and parameter-free methods for COVID-19 identification, particularly within lung images, we introduce a novel approach that combines the principles of topological data analysis with the capabilities of machine learning. Our proposed methodology entails the extraction of persistent homology features from lung images, effectively capturing the intrinsic topological properties inherent in the data. These extracted persistent homology features then serve as inputs for various machine learning methods employed for classification purposes. Our primary objective is to achieve exceptional accuracy in the detection of COVID-19 all while showcasing the effectiveness of these topological features. The experimental results demonstrate that the Random Forest Classifier and the Support Vector Machine models outperform the rest, showcasing their effectiveness in classifying CT scan lung images with remarkable precision-an accuracy rate of 97.5% for the Random Forest model and an AUC score that surpasses 0.99 for the SVM. Results of the model on the same data after exclusion of the topological features and on other data with application of the same model with topological features showed the efficiency of these features in the classification task.
在过去几年里,新冠病毒已夺走数千人的生命。尽管病原体实验室检测是既定标准,但它存在一个重大缺陷,即假阴性率显著。因此,迫切需要替代诊断方法来应对这一威胁。为了满足对新冠病毒识别的准确且无参数方法的迫切需求,特别是在肺部图像方面,我们引入了一种将拓扑数据分析原理与机器学习能力相结合的新方法。我们提出的方法需要从肺部图像中提取持久同调特征,有效地捕捉数据中固有的内在拓扑特性。然后,这些提取的持久同调特征作为用于分类目的的各种机器学习方法的输入。我们的主要目标是在检测新冠病毒方面实现卓越的准确性,同时展示这些拓扑特征的有效性。实验结果表明,随机森林分类器和支持向量机模型的表现优于其他模型,展示了它们在以显著精度对CT扫描肺部图像进行分类方面的有效性——随机森林模型的准确率为97.5%,支持向量机的AUC分数超过0.99。在排除拓扑特征后,该模型对相同数据的结果以及在应用相同拓扑特征模型的其他数据上的结果表明了这些特征在分类任务中的有效性。