Central Laboratory, The First Affiliated Hospital of Wannan Medical College (Yijishan Hospital of Wannan Medical College), Wuhu, Anhui, China; Anhui Province Key Laboratory of Non-coding RNA Basic and Clinical Transformation, Wuhu, Anhui, China.
Department of Gastrointestinal Surgery, The First Affiliated Hospital of Wannan Medical College (Yijishan Hospital of Wannan Medical College), Wuhu, Anhui, China.
Int Immunopharmacol. 2024 Dec 5;142(Pt A):113033. doi: 10.1016/j.intimp.2024.113033. Epub 2024 Sep 2.
Colorectal cancer (CRC) is the third most prevalent cancer globally, posing a significant challenge due to its high rate of metastasis. Approximately 20% of patients with CRC present with distant metastases at diagnosis, and over 50% develop metastases within five years. Accurate prediction of metastasis is crucial for improving survival outcomes in patients with CRC.
This study introduces an innovative cost-sensitive fast correlation-based filter (CS-FCBF) algorithm for feature selection, integrated with machine learning techniques to predict metastatic CRC. The CS-FCBF algorithm effectively reduced the number of genomic features from 184 to 9 critical genes: CXCL9, C2CD4B, RGCC, GFI1, BEX2, CXCL3, FOXQ1, PBK, and PLAG1. The methodology combined in vitro, in vivo, and analysis of publicly available single-cell RNA-seq datasets to validate the findings.
The application of the CS-FCBF algorithm led to a significant improvement in prediction model performance, with an average 21.16% increase in the area under the precision-recall curve. The nine identified genes hold potential as diagnostic biomarkers and therapeutic targets for metastatic CRC.
This study highlights the critical role of advanced feature selection methods, combined with machine learning, in addressing the challenge of class imbalance in medical diagnosis, particularly for CRC. Early detection of metastasis is vital, and the identified genes underscore their importance in the metastatic process of CRC. The methodology applied here offers valuable insights and paves the way for future research in other cancers or diseases that face similar diagnostic challenges.
结直肠癌(CRC)是全球第三大常见癌症,由于其高转移率,构成了重大挑战。约 20%的 CRC 患者在诊断时即存在远处转移,超过 50%的患者在五年内发生转移。准确预测转移对于改善 CRC 患者的生存结果至关重要。
本研究引入了一种创新的基于代价敏感快速相关滤波(CS-FCBF)的特征选择算法,与机器学习技术相结合,用于预测转移性 CRC。CS-FCBF 算法有效地将基因组特征数量从 184 个减少到 9 个关键基因:CXCL9、C2CD4B、RGCC、GFI1、BEX2、CXCL3、FOXQ1、PBK 和 PLAG1。该方法结合了体外、体内和分析公共单细胞 RNA-seq 数据集的结果来验证研究发现。
CS-FCBF 算法的应用显著提高了预测模型的性能,平均在精准召回曲线下面积增加了 21.16%。这 9 个鉴定的基因可能成为转移性 CRC 的诊断生物标志物和治疗靶标。
本研究强调了先进特征选择方法与机器学习相结合在解决医学诊断中类不平衡挑战方面的关键作用,特别是对于 CRC。早期检测转移至关重要,所鉴定的基因强调了它们在 CRC 转移过程中的重要性。本研究中应用的方法提供了有价值的见解,并为未来在面临类似诊断挑战的其他癌症或疾病中的研究铺平了道路。