Kai Jing, Yang Luyao, AbuElela Ayman F, Abdel-Haleem Alyaa M, AlAmoodi Asma S, Bin Nafisah Abdulghani A, Alshaibani Alfadel, Alzahrani Ali S, Lagani Vincenzo, Gomez-Cabrero David, Gao Xin, Merzaban Jasmeen S
Bioscience Program, King Abdullah University of Science and Technology (KAUST), Biological and Environmental Sciences and Engineering (BESE) Division, Thuwal 23955-6900, Saudi Arabia.
Computer Science Program, King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Centre (CBRC), Thuwal 23955-6900, Saudi Arabia.
Cell Rep Methods. 2025 Aug 18;5(8):101140. doi: 10.1016/j.crmeth.2025.101140. Epub 2025 Aug 11.
We identified a gene panel comprising 71 glycosyltransferases (GTs) that alter glycan patterns on cancer cells as they become more virulent. When these cancer-pattern GTs (CPGTs) were run through an algorithm trained on The Cancer Genome Atlas, they differentiated tumors from healthy tissue with 97% accuracy and clustered 27 cancers with 94% accuracy in external validation, revealing each variety's "biometric glycan ID." Using machine learning, we built four models for cancer classification, including two for detecting the molecular subtypes of breast cancer and glioma using even smaller CPGT sets. Our results reveal the power of using glyco-genes for diagnostics: Our breast cancer classifier was almost twice as effective in independent testing as the widely used prediction analysis of microarray 50 (PAM50) subtyping kit at differentiating between luminal A, luminal B, HER2-enriched, and basal-like breast cancers based on a comparable number of genes. Only four GT genes were needed to build a prognostic model for glioma survival.
我们鉴定出了一个由71种糖基转移酶(GT)组成的基因面板,这些酶在癌细胞变得更具侵袭性时会改变其聚糖模式。当这些癌症模式糖基转移酶(CPGT)通过在癌症基因组图谱上训练的算法运行时,它们在外部验证中以97%的准确率区分肿瘤与健康组织,并以94%的准确率对27种癌症进行聚类,揭示了每种癌症的“生物特征聚糖ID”。利用机器学习,我们构建了四种癌症分类模型,其中两种使用甚至更小的CPGT集来检测乳腺癌和神经胶质瘤的分子亚型。我们的结果揭示了使用糖基因进行诊断的强大作用:在基于相当数量基因区分腔面A型、腔面B型、HER2富集型和基底样乳腺癌方面,我们的乳腺癌分类器在独立测试中的效果几乎是广泛使用的微阵列50(PAM50)亚型检测试剂盒的两倍。仅需四个GT基因就能构建神经胶质瘤生存的预后模型。