Department of Computer Science and Information Engineering, College of Electrical Engineering and Computer Science, National Cheng Kung University, Tainan, Taiwan.
Institute of Medical Informatics, National Cheng Kung University, Tainan, Taiwan.
Hum Genomics. 2021 Jan 11;15(1):3. doi: 10.1186/s40246-020-00302-3.
Functional disruptions by large germline genomic structural variants in susceptible genes are known risks for cancer. We used deletion structural variants (DSVs) generated from germline whole-genome sequencing (WGS) and DSV immune-related association tumor microenvironment (TME) to predict cancer risk and prognosis.
We investigated the contribution of germline DSVs to cancer susceptibility and prognosis by silicon and causal inference models. DSVs in germline WGS data were generated from the blood samples of 192 cancer and 499 non-cancer subjects. Clinical information, including family cancer history (FCH), was obtained from the National Cheng Kung University Hospital and Taiwan Biobank. Ninety-nine colorectal cancer (CRC) patients had immune response gene expression data. We used joint calling tools and an attention-weighted model to build the cancer risk predictive model and identify DSVs in familial cancer. The survival support vector machine (survival-SVM) was used to select prognostic DSVs.
We identified 671 DSVs that could predict cancer risk. The area under the curve (AUC) of the receiver operating characteristic curve (ROC) of the attention-weighted model was 0.71. The 3 most frequent DSV genes observed in cancer patients were identified as ADCY9, AURKAPS1, and RAB3GAP2 (p < 0.05). The DSVs in SGSM2 and LHFPL3 were relevant to colorectal cancer. We found a higher incidence of FCH in cancer patients than in non-cancer subjects (p < 0.05). SMYD3 and NKD2DSV genes were associated with cancer patients with FCH (p < 0.05). We identified 65 immune-associated DSV markers for assessing cancer prognosis (p < 0.05). The functional protein of MUC4 DSV gene interacted with MAGE1 expression, according to the STRING database. The causal inference model showed that deleting the CEP72 DSV gene affect the recurrence-free survival (RFS) of IFIT1 expression.
We established an explainable attention-weighted model for cancer risk prediction and used the survival-SVM for prognostic stratification by using germline DSVs and immune gene expression datasets. Comprehensive assessments of germline DSVs can predict the cancer risk and clinical outcome of colon cancer patients.
易感基因中的大型种系基因组结构变体的功能障碍是癌症的已知风险。我们使用来自种系全基因组测序 (WGS) 的缺失结构变体 (DSV) 和 DSV 免疫相关关联肿瘤微环境 (TME) 来预测癌症风险和预后。
我们通过硅和因果推理模型研究了种系 DSV 对癌症易感性和预后的贡献。从 192 名癌症患者和 499 名非癌症患者的血液样本中生成种系 WGS 数据中的 DSV。临床信息,包括家族癌症史 (FCH),从国立成功大学医院和台湾生物银行获得。99 名结直肠癌 (CRC) 患者有免疫反应基因表达数据。我们使用联合调用工具和注意力加权模型构建癌症风险预测模型,并识别家族性癌症中的 DSV。生存支持向量机 (survival-SVM) 用于选择预后 DSV。
我们确定了 671 个可预测癌症风险的 DSV。注意力加权模型的接收器工作特征曲线 (ROC) 的曲线下面积 (AUC) 为 0.71。在癌症患者中观察到的最常见的 3 个 DSV 基因是 ADCY9、AURKAPS1 和 RAB3GAP2(p<0.05)。SGSM2 和 LHFPL3 中的 DSV 与结直肠癌相关。我们发现癌症患者的 FCH 发生率高于非癌症患者(p<0.05)。SMYD3 和 NKD2DSV 基因与有 FCH 的癌症患者相关(p<0.05)。我们确定了 65 个用于评估癌症预后的免疫相关 DSV 标记物(p<0.05)。根据 STRING 数据库,MUC4 DSV 基因的功能蛋白与 MAGE1 表达相互作用。因果推理模型表明,删除 CEP72 DSV 基因会影响 IFIT1 表达的无复发生存率 (RFS)。
我们建立了一个可解释的注意力加权模型,用于癌症风险预测,并使用种系 DSV 和免疫基因表达数据集使用生存 SVM 进行预后分层。综合评估种系 DSV 可预测结肠癌患者的癌症风险和临床结局。