Suppr超能文献

运用机器学习技术对流行病学数据进行建模,以发现胃癌的危险因素。

Modeling Epidemiology Data with Machine Learning Technique to Detect Risk Factors for Gastric Cancer.

机构信息

Department of Photogrammetry and Remote Sensing, Faculty of Geodesy and Geomatics Engineering, K. N. Toosi University of Technology, 19967-15433, Tehran, Iran.

Digestive Disease Research Center, Digestive Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran.

出版信息

J Gastrointest Cancer. 2024 Mar;55(1):287-296. doi: 10.1007/s12029-023-00952-1. Epub 2023 Jul 10.

Abstract

PURPOSE

Gastric cancer (GC) ranks as the 7th most common cancer worldwide and a leading cause of cancer mortality. In Iran, stomach malignancies are the most common fatal cancers with higher than world average incidence. In recent years, methods like machine learning that provide the opportunity of merging health issues with computational power and learning capacity have caught considerable attention for prediction and diagnosis of diseases. In this study, we aimed to model GC data to find risk factors and identify GC cases in Golestan Cohort Study (GCS), using gradient boosting as a machine learning technique.

METHODS

Since the GC class (280) was smaller than not-GC (49,467), "Synthetic Minority Oversampling Technique" was used to balance the dataset. Seventy percent of the data was used to train the gradient boosting algorithm and find effective factors on gastric cancer, and the remaining 30% was used for accuracy assessment.

RESULTS

Our results indicated that out of 19 factors, age, social economical status, tea temperature, body mass index, gender, and education were the top six effective factors with impact rates of 0.24, 0.16, 0.13, 0.13, and 0.07, respectively. The trained model classified 70 out of 72 GC patients in the test set, correctly.

CONCLUSION

The results indicate that this model can effectively detect gastric cancer (GC) by utilizing important risk factors, thus avoiding the need for invasive procedures. The model's performance is reliable when provided with an adequate amount of input data, and as the dataset expands, its accuracy and generalization improve significantly. Overall, the trained system's success stems from its ability to identify risk factors and identify cancer patients.

摘要

目的

胃癌(GC)是全球第七大常见癌症,也是癌症死亡的主要原因。在伊朗,胃部恶性肿瘤是最常见的致命癌症,发病率高于世界平均水平。近年来,机器学习等方法为将健康问题与计算能力和学习能力相结合提供了机会,这些方法引起了人们对疾病预测和诊断的极大关注。在这项研究中,我们旨在使用梯度提升作为机器学习技术,对 GC 数据进行建模,以发现危险因素并识别戈勒斯坦队列研究(GCS)中的 GC 病例。

方法

由于 GC 类(280)小于非 GC 类(49467),因此使用“合成少数过采样技术”来平衡数据集。70%的数据用于训练梯度提升算法并找到胃癌的有效因素,其余 30%的数据用于准确性评估。

结果

我们的结果表明,在 19 个因素中,年龄、社会经济地位、茶温、体重指数、性别和教育是前六个有效因素,影响率分别为 0.24、0.16、0.13、0.13 和 0.07。训练后的模型在测试集中正确分类了 70 例 72 例 GC 患者。

结论

结果表明,该模型可以通过利用重要的危险因素有效地检测胃癌(GC),从而避免进行有创程序。当提供足够数量的输入数据时,该模型的性能可靠,并且随着数据集的扩展,其准确性和泛化能力显著提高。总体而言,训练有素的系统的成功源于其识别危险因素和识别癌症患者的能力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验