Suppr超能文献

机器学习算法在放射组学中的比较性能及影响因素。

Comparative performances of machine learning algorithms in radiomics and impacting factors.

机构信息

Université Paris Cité, PARCC UMRS 970, INSERM, Paris, France.

Unité de Recherche Clinique, Center d'Investigation Clinique 1418 Épidémiologie Clinique, Université Paris Cité, AP-HP, Hôpital Européen Georges Pompidou, INSERM, Paris, France.

出版信息

Sci Rep. 2023 Aug 28;13(1):14069. doi: 10.1038/s41598-023-39738-7.

Abstract

There are no current recommendations on which machine learning (ML) algorithms should be used in radiomics. The objective was to compare performances of ML algorithms in radiomics when applied to different clinical questions to determine whether some strategies could give the best and most stable performances regardless of datasets. This study compares the performances of nine feature selection algorithms combined with fourteen binary classification algorithms on ten datasets. These datasets included radiomics features and clinical diagnosis for binary clinical classifications including COVID-19 pneumonia or sarcopenia on CT, head and neck, orbital or uterine lesions on MRI. For each dataset, a train-test split was created. Each of the 126 (9 × 14) combinations of feature selection algorithms and classification algorithms was trained and tuned using a ten-fold cross validation, then AUC was computed. This procedure was repeated three times per dataset. Best overall performances were obtained with JMI and JMIM as feature selection algorithms and random forest and linear regression models as classification algorithms. The choice of the classification algorithm was the factor explaining most of the performance variation (10% of total variance). The choice of the feature selection algorithm explained only 2% of variation, while the train-test split explained 9%.

摘要

目前尚无关于应在放射组学中使用哪种机器学习 (ML) 算法的建议。本研究旨在比较 ML 算法在放射组学中的性能,当应用于不同的临床问题时,以确定是否存在一些策略可以提供最佳和最稳定的性能,而不受数据集的影响。本研究比较了十种数据集上九种特征选择算法与十四种二分类算法的性能。这些数据集包括 CT 上的 COVID-19 肺炎或肌少症、头颈部、眼眶或子宫病变的放射组学特征和临床诊断。对于每个数据集,创建了一个训练-测试分割。使用十折交叉验证对 126(9×14)种特征选择算法和分类算法的组合进行了训练和调优,然后计算 AUC。对每个数据集重复了三次此过程。使用 JMI 和 JMIM 作为特征选择算法,随机森林和线性回归模型作为分类算法,获得了最佳的总体性能。分类算法的选择是解释性能变化的主要因素(总方差的 10%)。特征选择算法的选择仅解释了 2%的变化,而训练-测试分割解释了 9%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c847/10462640/0c22010a66ac/41598_2023_39738_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验