Department of Medical Genetic, Faculty of Medicine, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran.
Department of Radiology Technology, Shoushtar Faculty of Medical Sciences, Shoushtar, Iran.
BMC Bioinformatics. 2022 Oct 1;23(1):410. doi: 10.1186/s12859-022-04965-8.
BACKGROUND: We used a hybrid machine learning systems (HMLS) strategy that includes the extensive search for the discovery of the most optimal HMLSs, including feature selection algorithms, a feature extraction algorithm, and classifiers for diagnosing breast cancer. Hence, this study aims to obtain a high-importance transcriptome profile linked with classification procedures that can facilitate the early detection of breast cancer. METHODS: In the present study, 762 breast cancer patients and 138 solid tissue normal subjects were included. Three groups of machine learning (ML) algorithms were employed: (i) four feature selection procedures are employed and compared to select the most valuable feature: (1) ANOVA; (2) Mutual Information; (3) Extra Trees Classifier; and (4) Logistic Regression (LGR), (ii) a feature extraction algorithm (Principal Component Analysis), iii) we utilized 13 classification algorithms accompanied with automated ML hyperparameter tuning, including (1) LGR; (2) Support Vector Machine; (3) Bagging; (4) Gaussian Naive Bayes; (5) Decision Tree; (6) Gradient Boosting Decision Tree; (7) K Nearest Neighborhood; (8) Bernoulli Naive Bayes; (9) Random Forest; (10) AdaBoost, (11) ExtraTrees; (12) Linear Discriminant Analysis; and (13) Multilayer Perceptron (MLP). For evaluating the proposed models' performance, balance accuracy and area under the curve (AUC) were used. RESULTS: Feature selection procedure LGR + MLP classifier achieved the highest prediction accuracy and AUC (balanced accuracy: 0.86, AUC = 0.94), followed by an LGR + LGR classifier (balanced accuracy: 0.84, AUC = 0.94). The results showed that achieved AUC for the LGR + LGR classifier belonged to the 20 biomarkers as follows: TMEM212, SNORD115-13, ATP1A4, FRG2, CFHR4, ZCCHC13, FLJ46361, LY6G6E, ZNF323, KRT28, KRT25, LPPR5, C10orf99, PRKACG, SULT2A1, GRIN2C, EN2, GBA2, CUX2, and SNORA66. CONCLUSIONS: The best performance was achieved using the LGR feature selection procedure and MLP classifier. Results show that the 20 biomarkers had the highest score or ranking in breast cancer detection.
背景:我们使用了一种混合机器学习系统(HMLS)策略,该策略包括广泛搜索发现最佳 HMLS,包括特征选择算法、特征提取算法和用于诊断乳腺癌的分类器。因此,本研究旨在获得与分类过程相关的高重要性转录组谱,以促进乳腺癌的早期发现。
方法:本研究纳入了 762 例乳腺癌患者和 138 例实体组织正常对照。使用了三组机器学习(ML)算法:(i)采用了四种特征选择程序进行比较,以选择最有价值的特征:(1)方差分析;(2)互信息;(3)Extra Trees 分类器;和(4)逻辑回归(LGR),(ii)特征提取算法(主成分分析),(iii)我们使用了 13 种分类算法,并结合自动化 ML 超参数调整,包括(1)LGR;(2)支持向量机;(3)袋装;(4)高斯朴素贝叶斯;(5)决策树;(6)梯度提升决策树;(7)K 近邻;(8)伯努利朴素贝叶斯;(9)随机森林;(10)AdaBoost,(11)ExtraTrees;(12)线性判别分析;和(13)多层感知机(MLP)。为了评估所提出模型的性能,使用平衡准确性和曲线下面积(AUC)。
结果:特征选择程序 LGR+MLP 分类器实现了最高的预测准确性和 AUC(平衡准确性:0.86,AUC=0.94),其次是 LGR+LGR 分类器(平衡准确性:0.84,AUC=0.94)。结果表明,LGR+LGR 分类器的 AUC 属于以下 20 个生物标志物:TMEM212、SNORD115-13、ATP1A4、FRG2、CFHR4、ZCCHC13、FLJ46361、LY6G6E、ZNF323、KRT28、KRT25、LPPR5、C10orf99、PRKACG、SULT2A1、GRIN2C、EN2、GBA2、CUX2 和 SNORA66。
结论:LGR 特征选择程序和 MLP 分类器的性能最佳。结果表明,这 20 个生物标志物在乳腺癌检测中具有最高的评分或排名。
BMC Bioinformatics. 2022-10-1
BMC Med Inform Decis Mak. 2023-11-29
J Xray Sci Technol. 2021
Comput Intell Neurosci. 2023
Asian Pac J Cancer Prev. 2024-3-1
Asian Pac J Cancer Prev. 2022-10-1
J Health Popul Nutr. 2024-10-12
BMC Med Inform Decis Mak. 2017-4-13
Proc Inst Mech Eng H. 2021-10
PeerJ Comput Sci. 2021-7-12
J Cell Physiol. 2020-6
Sci Rep. 2018-10-31
Int J Biol Sci. 2017-11-1
Semin Oncol Nurs. 2017-5
Prz Menopauzalny. 2015-9