Nfor Oswald Ndi, Huang Pei-Ming, Wu Ming-Fang, Chen Ke-Cheng, Chou Ying-Hsiang, Lin Mong-Wei, Zhong Ji-Han, Kuo Shuenn-Wen, Lee Yu-Kwang, Hsu Chih-Hung, Lee Jang-Ming, Liaw Yung-Po
Department of Public Health, Institute of Public Health, Chung Shan Medical University, No.110, Sec.1, Jianguo North Road, Taichung, 40201, Taiwan.
Department of Medicine, National Taiwan University College of Medicine, No.1, Sec.1, Jen-Ai Road, Taipei, 100233, Taiwan.
J Transl Med. 2025 Mar 28;23(1):379. doi: 10.1186/s12967-025-06383-9.
Esophageal cancer (EC) presents a significant public health challenge globally, particularly in regions with high alcohol consumption. Its etiology is multifactorial, involving both genetic predispositions and lifestyle factors.
This study aimed to develop a personalized risk prediction model for EC by integrating genetic polymorphisms (rs671 and rs1229984) with virtually generated alcohol consumption data, utilizing advanced artificial intelligence and machine learning techniques. We analyzed data from 86,845 individuals, including 763 diagnosed EC patients, sourced from the Taiwan Biobank. Eight machine learning models were employed: Bayesian Network, Decision Tree, Ensemble, Gradient Boosting, Logistic Regression, LASSO, Random Forest, and Support Vector Machines (SVM). A unique aspect of our approach was the virtual generation of alcohol consumption data, allowing us to evaluate risk profiles under both consuming and non-consuming scenarios.
Our analysis revealed that individuals with the genotypes rs671 = AG and rs1229984 = CC exhibited the highest probabilities of developing EC, with values ranging from 0.2041 to 0.9181. Notably, abstaining from alcohol could decrease their risk by approximately 16.29-49.58%. The Ensemble model demonstrated exceptional performance, achieving an area under the curve (AUC) of 0.9577 and a sensitivity of 0.9211. This transition from consumption to abstinence indicated a potential risk reduction of nearly 50% for individuals with high-risk genotypes.
Overall, our findings highlight the importance of integrating virtually generated alcohol data for more precise personalized risk assessments for EC.
食管癌(EC)在全球范围内构成了重大的公共卫生挑战,尤其是在酒精消费量高的地区。其病因是多因素的,涉及遗传易感性和生活方式因素。
本研究旨在通过将基因多态性(rs671和rs1229984)与虚拟生成的酒精消费数据相结合,利用先进的人工智能和机器学习技术,开发一种针对食管癌的个性化风险预测模型。我们分析了来自台湾生物银行的86845人的数据,其中包括763名确诊的食管癌患者。使用了八种机器学习模型:贝叶斯网络、决策树、集成模型、梯度提升、逻辑回归、套索回归、随机森林和支持向量机(SVM)。我们方法的一个独特之处是虚拟生成酒精消费数据,使我们能够评估饮酒和不饮酒情况下的风险概况。
我们的分析表明,基因型为rs671 = AG和rs1229984 = CC的个体患食管癌的概率最高,范围为0.2041至0.9181。值得注意的是,戒酒可使他们的风险降低约16.29 - 49.58%。集成模型表现出色,曲线下面积(AUC)为0.9577,灵敏度为0.9211。从饮酒到戒酒的这种转变表明,高危基因型个体的潜在风险降低了近一半。
总体而言,我们的研究结果强调了整合虚拟生成的酒精数据以进行更精确的食管癌个性化风险评估的重要性。