Fu Bo, Liu Pei, Lin Jie, Deng Ling, Hu Kejia, Zheng Hong
IEEE Trans Biomed Eng. 2018 Nov 22. doi: 10.1109/TBME.2018.2882867.
Chinese women are seriously threatened by breast cancer with high morbidity and mortality. The lack of robust prognosis models results in difficulty for doctors to prepare an appropriate treatment plan that may prolong patient survival time. An alternative prognosis model framework to predict Invasive Disease-Free Survival (iDFS) for early-stage breast cancer patients, called MP4Ei, is proposed. MP4Ei framework gives an excellent performance to predict the relapse or metastasis breast cancer of Chinese patients in 5 years.
MP4Ei is built based on statistical theory and gradient boosting decision tree framework. 5246 patients, derived from the Clinical Research Center for Breast (CRCB) in West China Hospital of Sichuan University, with early-stage (stage I-III) breast cancer are eligible for inclusion. Stratified feature selection, including statistical and ensemble methods, is adopted to select 23 out of the 89 patient features about the patient' demographics, diagnosis, pathology and therapy. Then 23 selected features as the input variables are imported into the XGBoost algorithm, with Bayesian parameter tuning and cross validation, to find out the optimum simplified model for 5-year iDFS prediction.
For eligible data, with 4196 patients (80%) for training, and with 1050 patients (20%) for testing, MP4Ei achieves comparable accuracy with AUC 0.8451, which has a significant advantage (p < 0.05).
This work demonstrates the complete iDFS prognosis model with very competitive performance.
The proposed method in this paper could be used in clinical practice to predict patients' prognosis and future surviving state, which may help doctors make treatment plan.
中国女性受到乳腺癌的严重威胁,其发病率和死亡率都很高。缺乏可靠的预后模型导致医生难以制定可能延长患者生存时间的合适治疗方案。本文提出了一种用于预测早期乳腺癌患者无浸润性疾病生存期(iDFS)的替代预后模型框架,称为MP4Ei。MP4Ei框架在预测中国患者5年内乳腺癌复发或转移方面表现出色。
MP4Ei基于统计理论和梯度提升决策树框架构建。来自四川大学华西医院乳腺疾病临床研究中心的5246例早期(I-III期)乳腺癌患者符合纳入标准。采用分层特征选择方法,包括统计方法和集成方法,从89个关于患者人口统计学、诊断、病理和治疗的患者特征中选择23个。然后将这23个选定特征作为输入变量导入XGBoost算法,通过贝叶斯参数调整和交叉验证,找出用于5年iDFS预测的最优简化模型。
对于符合条件的数据,4196例患者(80%)用于训练,1050例患者(20%)用于测试,MP4Ei的AUC为0.8451,达到了可比的准确性,具有显著优势(p<0.05)。
这项工作展示了具有非常有竞争力性能的完整iDFS预后模型。
本文提出的方法可用于临床实践,预测患者的预后和未来生存状态,这可能有助于医生制定治疗方案。