School of Basic Education, Changsha Aeronautical Vocational and Technical College, Changsha, Hunan 410124, China.
College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China.
Math Biosci Eng. 2021 Sep 7;18(6):7711-7726. doi: 10.3934/mbe.2021382.
Tumor heterogeneity significantly increases the difficulty of tumor treatment. The same drugs and treatment methods have different effects on different tumor subtypes. Therefore, tumor heterogeneity is one of the main sources of poor prognosis, recurrence and metastasis. At present, there have been some computational methods to study tumor heterogeneity from the level of genome, transcriptome, and histology, but these methods still have certain limitations. In this study, we proposed an epistasis and heterogeneity analysis method based on genomic single nucleotide polymorphism (SNP) data. First of all, a maximum correlation and maximum consistence criteria was designed based on Bayesian network score and information entropy for evaluating genomic epistasis. As the number of SNPs increases, the epistasis combination space increases sharply, resulting in a combination explosion phenomenon. Therefore, we next use an improved genetic algorithm to search the SNP epistatic combination space for identifying potential feasible epistasis solutions. Multiple epistasis solutions represent different pathogenic gene combinations, which may lead to different tumor subtypes, that is, heterogeneity. Finally, the XGBoost classifier is trained with feature SNPs selected that constitute multiple sets of epistatic solutions to verify that considering tumor heterogeneity is beneficial to improve the accuracy of tumor subtype prediction. In order to demonstrate the effectiveness of our method, the power of multiple epistatic recognition and the accuracy of tumor subtype classification measures are evaluated. Extensive simulation results show that our method has better power and prediction accuracy than previous methods.
肿瘤异质性显著增加了肿瘤治疗的难度。相同的药物和治疗方法对不同的肿瘤亚型有不同的效果。因此,肿瘤异质性是预后不良、复发和转移的主要原因之一。目前,已经有一些计算方法可以从基因组、转录组和组织学水平研究肿瘤异质性,但这些方法仍然存在一定的局限性。在这项研究中,我们提出了一种基于基因组单核苷酸多态性(SNP)数据的上位性和异质性分析方法。首先,我们设计了一种基于贝叶斯网络评分和信息熵的最大相关性和最大一致性标准,用于评估基因组上位性。随着 SNP 数量的增加,上位性组合空间急剧增加,导致组合爆炸现象。因此,我们接下来使用改进的遗传算法来搜索 SNP 上位性组合空间,以识别潜在的可行上位性解决方案。多个上位性解决方案代表不同的致病基因组合,可能导致不同的肿瘤亚型,即异质性。最后,使用特征 SNP 训练 XGBoost 分类器,这些 SNP 构成了多组上位性解决方案,以验证考虑肿瘤异质性有助于提高肿瘤亚型预测的准确性。为了证明我们方法的有效性,评估了多个上位性识别的功效和肿瘤亚型分类的准确性度量。广泛的仿真结果表明,与以前的方法相比,我们的方法具有更好的功效和预测准确性。