Kucukakcali Zeynep, Akbulut Sami, Colak Cemil
Department of Biostatistics and Medical Informatics, Inonu University Faculty of Medicine, Malatya 44280, Türkiye.
Surgery and Liver Transplant Institute, Inonu University Faculty of Medicine, Malatya 44280, Türkiye.
World J Clin Cases. 2025 Jul 16;13(20):104556. doi: 10.12998/wjcc.v13.i20.104556.
Endometriosis is a clinical condition characterized by the presence of endometrial glands outside the uterine cavity. While its incidence remains mostly uncertain, endometriosis impacts around 180 million women worldwide. Despite the presentation of several epidemiological and clinical explanations, the precise mechanism underlying the disease remains ambiguous. In recent years, researchers have examined the hereditary dimension of the disease. Genetic research has aimed to discover the gene or genes responsible for the disease through association or linkage studies involving candidate genes or DNA mapping techniques.
To identify genetic biomarkers linked to endometriosis by the application of machine learning (ML) approaches.
This case-control study accounted for the open-access transcriptomic data set of endometriosis and the control group. We included data from 22 controls and 16 endometriosis patients for this purpose. We used AdaBoost, XGBoost, Stochasting Gradient Boosting, Bagged Classification and Regression Trees (CART) for classification using five-fold cross validation. We evaluated the performance of the models using the performance measures of accuracy, balanced accuracy, sensitivity, specificity, positive predictive value, negative predictive value and F1 score.
Bagged CART gave the best classification metrics. The metrics obtained from this model are 85.7%, 85.7%, 100%, 75%, 75%, 100% and 85.7% for accuracy, balanced accuracy, sensitivity, specificity, positive predictive value, negative predictive value and F1 score, respectively. Based on the variable importance of modeling, we can use the genes , , , , , , , , and and other transcripts with inaccessible gene names as potential biomarkers for endometriosis.
This study determined possible genomic biomarkers for endometriosis using transcriptomic data from patients with/without endometriosis. The applied ML model successfully classified endometriosis and created a highly accurate diagnostic prediction model. Future genomic studies could explain the underlying pathology of endometriosis, and a non-invasive diagnostic method could replace the invasive ones.
子宫内膜异位症是一种临床病症,其特征是子宫腔外存在子宫内膜腺体。虽然其发病率大多仍不确定,但子宫内膜异位症影响着全球约1.8亿女性。尽管有多种流行病学和临床解释,但该疾病的确切发病机制仍不明确。近年来,研究人员对该疾病的遗传层面进行了研究。基因研究旨在通过涉及候选基因或DNA图谱技术的关联或连锁研究来发现导致该疾病的一个或多个基因。
通过应用机器学习(ML)方法来识别与子宫内膜异位症相关的基因生物标志物。
本病例对照研究纳入了子宫内膜异位症组和对照组的开放获取转录组数据集。为此,我们纳入了22名对照者和16名子宫内膜异位症患者的数据。我们使用AdaBoost、XGBoost、随机梯度提升、袋装分类与回归树(CART)进行分类,并采用五折交叉验证。我们使用准确率、平衡准确率、灵敏度、特异性、阳性预测值、阴性预测值和F1分数等性能指标来评估模型的性能。
袋装CART给出了最佳分类指标。该模型获得的指标分别为:准确率85.7%、平衡准确率85.7%、灵敏度100%、特异性75%、阳性预测值75%、阴性预测值100%和F1分数85.7%。基于建模的变量重要性,我们可以将基因 、 、 、 、 、 、 、 和 以及其他基因名称无法获取的转录本用作子宫内膜异位症的潜在生物标志物。
本研究利用有/无子宫内膜异位症患者的转录组数据确定了子宫内膜异位症可能的基因组生物标志物。所应用的ML模型成功地对子宫内膜异位症进行了分类,并创建了一个高度准确的诊断预测模型。未来的基因组研究可以解释子宫内膜异位症的潜在病理,一种非侵入性诊断方法可能会取代侵入性方法。