Liu Yongjun, Xu Yuqing, Li Xiaoxing, Chen Mengke, Wang Xueqin, Zhang Ning, Zhang Heping, Zhang Zhengjun
Department of Laboratory Medicine and Pathology, University of Washington Medical Center, Seattle, WA, USA.
Department of Statistics, University of Wisconsin-Madison, Madison, WI, USA.
NPJ Precis Oncol. 2024 Jan 20;8(1):13. doi: 10.1038/s41698-024-00512-1.
The goal of this study was to use a new interpretable machine-learning framework based on max-logistic competing risk factor models to identify a parsimonious set of differentially expressed genes (DEGs) that play a pivotal role in the development of colorectal cancer (CRC). Transcriptome data from nine public datasets were analyzed, and a new Chinese cohort was collected to validate the findings. The study discovered a set of four critical DEGs - CXCL8, PSMC2, APP, and SLC20A1 - that exhibit the highest accuracy in detecting CRC in diverse populations and ethnicities. Notably, PSMC2 and CXCL8 appear to play a central role in CRC, and CXCL8 alone could potentially serve as an early-stage marker for CRC. This work represents a pioneering effort in applying the max-logistic competing risk factor model to identify critical genes for human malignancies, and the interpretability and reproducibility of the results across diverse populations suggests that the four DEGs identified can provide a comprehensive description of the transcriptomic features of CRC. The practical implications of this research include the potential for personalized risk assessment and precision diagnosis and tailored treatment plans for patients.
本研究的目标是使用基于最大对数竞争风险因子模型的新型可解释机器学习框架,以识别在结直肠癌(CRC)发生发展中起关键作用的一组简约的差异表达基因(DEG)。分析了来自九个公共数据集的转录组数据,并收集了一个新的中国队列以验证研究结果。该研究发现了一组四个关键的DEG——CXCL8、PSMC2、APP和SLC20A1——它们在检测不同人群和种族的CRC方面具有最高的准确性。值得注意的是,PSMC2和CXCL8似乎在CRC中起核心作用,仅CXCL8就有可能作为CRC的早期标志物。这项工作代表了应用最大对数竞争风险因子模型识别人类恶性肿瘤关键基因的开创性努力,并且结果在不同人群中的可解释性和可重复性表明,所识别的四个DEG可以全面描述CRC的转录组特征。本研究的实际意义包括为患者进行个性化风险评估、精准诊断和制定量身定制的治疗方案的可能性。