Jeon Hajin, Lee Jong Lyul, Shim Hyeran, Joe Soobok, Byeon Iksu, Kim Chan Wook, Lim Seok-Byung, Park In Ja, Yoon Yong Sik, Chu Hoang Bao Khanh, Kim Young-Joon, Yu Chang Sik, Yang Jin Ok
Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience & Biotechnology (KRIBB), Daejeon, Republic of Korea.
Department of Surgery, Division of Colon and Rectal Surgery, University of Ulsan College of Medicine and Asan Medical Center, Seoul, Republic of Korea.
PLoS One. 2025 May 23;20(5):e0323302. doi: 10.1371/journal.pone.0323302. eCollection 2025.
Colorectal cancer (CRC) has the second highest incidence rate among all cancers in Korea, with approximately 30% of patients with regional CRC experiencing recurrence. Understanding the genetic drivers of recurrence is essential for early detection and targeted treatment. Therefore, many studies have focused on genetic analysis using tumor-normal matched samples, as this approach provides more comprehensive insights. However, tumor-only samples are far more common in clinical practice because of the difficulty in obtaining normal tissues, making developing robust methods for analyzing tumor-only data a pressing need. This study aimed to investigate the genetic variations associated with CRC recurrence using tumor-only whole-exome sequencing data from 200 Korean patients with stage III CRC. By applying stringent filtering using public databases including Genome Aggregation Database (gnomAD), Exome Aggregation Consortium (ExAC), Single Nucleotide Polymorphism Database (dbSNP), 1000 Genomes Project (1000G), Korean Variant Archive 2 (KOVA2), and Korean Reference Genome Database (KRGDB), we identified 221 statistically significant mutations across 195 genes with distinct distributions between the recurrence and non-recurrence groups. Furthermore, statistical analysis of the clinical data revealed that the T-category, N-category, and preoperative carcinoembryonic antigen levels were correlated with CRC recurrence. Moreover, we identified nine networks through protein-protein interaction analysis and identified networks with high feature importance. We also developed a CRC recurrence prediction model using PyCaret, which achieved an area under the curve (AUC) of 0.77. Our findings highlight the importance of robust variant filtering in tumor-only sample analyses and provide insights into the genetic landscape of CRC recurrence in the Korean population.
在韩国,结直肠癌(CRC)的发病率在所有癌症中位居第二,约30%的局部CRC患者会出现复发。了解复发的遗传驱动因素对于早期检测和靶向治疗至关重要。因此,许多研究都集中在使用肿瘤-正常配对样本进行基因分析,因为这种方法能提供更全面的见解。然而,由于获取正常组织困难,仅肿瘤样本在临床实践中更为常见,这使得开发强大的仅肿瘤数据分析方法成为迫切需求。本研究旨在利用来自200名韩国III期CRC患者的仅肿瘤全外显子测序数据,调查与CRC复发相关的基因变异。通过使用包括基因组聚合数据库(gnomAD)、外显子聚合联盟(ExAC)、单核苷酸多态性数据库(dbSNP)、千人基因组计划(1000G)、韩国变异存档2(KOVA2)和韩国参考基因组数据库(KRGDB)在内的公共数据库进行严格筛选,我们在195个基因中鉴定出221个具有统计学意义的突变,这些突变在复发组和非复发组之间具有不同的分布。此外,对临床数据的统计分析表明,T分期、N分期和术前癌胚抗原水平与CRC复发相关。此外,我们通过蛋白质-蛋白质相互作用分析确定了九个网络,并确定了具有高特征重要性的网络。我们还使用PyCaret开发了一个CRC复发预测模型,其曲线下面积(AUC)为0.77。我们的研究结果突出了在仅肿瘤样本分析中进行强大变异筛选的重要性,并为韩国人群中CRC复发的遗传图谱提供了见解。