Kilicarslan Sabire, Hiz-Cicekliyurt Meliha Merve
Çanakkale Onsekiz Mart University, Graduate School of Sciences, Department of Medical System Biology, Çanakkale, Turkey.
Çanakkale Onsekiz Mart University, Faculty of Medicine, Department of Medical Biology, Çanakkale, Turkey.
Endocrine. 2025 Feb;87(2):758-771. doi: 10.1007/s12020-024-04068-9. Epub 2024 Oct 14.
Papillary thyroid cancer (PTC) is the predominant form of malignant tumor affecting the thyroid gland.
This study aimed to identify candidate biomarkers for papillary thyroid carcinoma using an integrative analysis of bioinformatics and machine learning (ML).
The PTC datasets GSE6004, GSE3467, and GSE33630 (species: Homo sapiens) were downloaded from NCBI and analyzed using the limma package to obtain DEGs. Once DEGs were identified, GO and KEGG enrichment analyses were performed as the first step in the bioinformatics process. Subsequently, a protein-protein interaction (PPI) network was constructed according to the common genes in bioinformatics and machine learning using STRING to elucidate the important genes involved in PTC pathogenesis. In machine learning, finding genes entails feature selection to identify the key genes that distinguish biological states. Hybrid feature selection will be used for this. In the second step, the original data sets were preprocessed to detect and correct missing and noisy data; after that, all data were merged. Following performing Linear and Discriminative Hybrid Feature Selection (LDHFS) on the processed dataset, machine learning algorithms such as Random Forest (RF), Naive Bayes (NB), and Support Vector Machines (SVM) are utilized.
Bioinformatics and machine learning analyses indicate that the genes RXRG, CDH2, ETV5, QPCT, LRP4, FN1, and LPAR5 are integral to the progression of thyroid cancer. This study attained the highest accuracy utilizing the RF algorithm, achieving an accuracy rate of 94.62%, a Kappa value of 91.36%, and an AUC value of 96.13%. These results offer additional evidence and confirmation for the genetic alterations of these genes. These findings may accelerate the development of prospective therapeutic and diagnostic methods in future research.
Bioinformatics and machine learning techniques identified the common genes "RXRG, CDH2, ETV5, QPCT, LRP4, FN1, and LPAR5" as PTC biomarkers, providing novel reference markers for the diagnosis and treatment of PTC patients. The model is anticipated to possess significant predictive value and assist in the early diagnosis and screening of clinical PTC. These insights enhance the field of PTC management and offer guidance for future research.
甲状腺乳头状癌(PTC)是影响甲状腺的主要恶性肿瘤形式。
本研究旨在通过生物信息学和机器学习(ML)的综合分析来识别甲状腺乳头状癌的候选生物标志物。
从NCBI下载PTC数据集GSE6004、GSE3467和GSE33630(物种:智人),并使用limma软件包进行分析以获得差异表达基因(DEGs)。一旦确定了DEGs,作为生物信息学过程的第一步,进行基因本体论(GO)和京都基因与基因组百科全书(KEGG)富集分析。随后,根据生物信息学和机器学习中的共同基因,使用STRING构建蛋白质-蛋白质相互作用(PPI)网络,以阐明参与PTC发病机制的重要基因。在机器学习中,寻找基因需要进行特征选择以识别区分生物状态的关键基因。为此将使用混合特征选择。第二步,对原始数据集进行预处理以检测和纠正缺失及噪声数据;之后,合并所有数据。在对处理后的数据集执行线性和判别混合特征选择(LDHFS)之后,利用随机森林(RF)、朴素贝叶斯(NB)和支持向量机(SVM)等机器学习算法。
生物信息学和机器学习分析表明,基因RXRG、CDH2、ETV5、QPCT、LRP4、FN1和LPAR5对甲状腺癌的进展至关重要。本研究使用RF算法获得了最高准确率,准确率为94.62%,卡帕值为91.36%,曲线下面积(AUC)值为96.13%。这些结果为这些基因的遗传改变提供了更多证据和确认。这些发现可能会加速未来研究中前瞻性治疗和诊断方法的开发。
生物信息学和机器学习技术将“RXRG、CDH2、ETV5、QPCT、LRP4、FN1和LPAR5”这些共同基因鉴定为PTC生物标志物,为PTC患者的诊断和治疗提供了新的参考标志物。该模型预计具有显著的预测价值,并有助于临床PTC的早期诊断和筛查。这些见解加强了PTC管理领域,并为未来研究提供了指导。