Hsu Te-Cheng, Lin Che
Institute of Communications Engineering, National Tsing Hua University, Hsinchu 30013, Taiwan.
Graduate Institute of Communication Engineering, National Taiwan University, Taipei 10617, Taiwan.
Bioinform Adv. 2023 Jan 9;3(1):vbac100. doi: 10.1093/bioadv/vbac100. eCollection 2023.
Cancer is one of the world's leading mortality causes, and its prognosis is hard to predict due to complicated biological interactions among heterogeneous data types. Numerous challenges, such as censorship, high dimensionality and small sample size, prevent researchers from using deep learning models for precise prediction.
We propose a robust Semi-supervised Cancer prognosis classifier with bAyesian variational autoeNcoder () as a structured machine-learning framework for cancer prognosis prediction. incorporates semi-supervised learning for predicting 5-year disease-specific survival and overall survival in breast and non-small cell lung cancer (NSCLC) patients, respectively. achieved significantly better AUROC scores than all existing benchmarks (81.73% for breast cancer; 80.46% for NSCLC), including our previously proposed bimodal neural network classifiers (77.71% for breast cancer; 78.67% for NSCLC). Independent validation results showed that still achieved better AUROC scores (74.74% for breast; 72.80% for NSCLC) than the bimodal neural network classifiers (64.13% for breast; 67.07% for NSCLC). is general and can potentially be trained on more patient data. This paves the foundation for personalized medicine for early cancer risk screening.
The source codes reproducing the main results are available on GitHub: https://gitfront.io/r/user-4316673/36e8714573f3fbfa0b24690af5d1a9d5ca159cf4/scan/.
Supplementary data are available at online.
癌症是全球主要的死亡原因之一,由于异质数据类型之间复杂的生物相互作用,其预后难以预测。诸多挑战,如删失、高维度和小样本量,阻碍了研究人员使用深度学习模型进行精确预测。
我们提出了一种稳健的半监督癌症预后分类器,即带有贝叶斯变分自编码器的(),作为用于癌症预后预测的结构化机器学习框架。分别将半监督学习纳入预测乳腺癌和非小细胞肺癌(NSCLC)患者的5年疾病特异性生存率和总生存率。与所有现有基准相比,取得了显著更高的AUROC分数(乳腺癌为81.73%;NSCLC为80.46%),包括我们之前提出的双峰神经网络分类器(乳腺癌为77.71%;NSCLC为78.67%)。独立验证结果表明,与双峰神经网络分类器(乳腺癌为64.13%;NSCLC为67.07%)相比,仍取得了更好的AUROC分数(乳腺癌为74.74%;NSCLC为72.80%)。具有通用性,有可能在更多患者数据上进行训练。这为早期癌症风险筛查的个性化医疗奠定了基础。
重现主要结果的源代码可在GitHub上获取:https://gitfront.io/r/user-4316673/36e8714573f3fbfa0b24690af5d1a9d5ca159cf4/scan/。
补充数据可在在线获取。