School of Computer Science, Qufu Normal University, Rizhao 276826, China.
Genes (Basel). 2022 Apr 19;13(5):716. doi: 10.3390/genes13050716.
Cancer is a complex disease caused by genomic and epigenetic alterations; hence, identifying meaningful cancer drivers is an important and challenging task. Most studies have detected cancer drivers with mutated traits, while few studies consider multiple omics characteristics as important factors. In this study, we present a framework to analyze the effects of multi-omics characteristics on the identification of driver genes. We utilize four machine learning algorithms within this framework to detect cancer driver genes in pan-cancer data, including 75 characteristics among 19,636 genes. The 75 features are divided into four types and analyzed using Kullback-Leibler divergence based on CGC genes and non-CGC genes. We detect cancer driver genes in two different ways. One is to detect driver genes from a single feature type, while the other is from the top N features. The first analysis denotes that the mutational features are the best characteristics. The second analysis reveals that the top 45 features are the most effective feature combinations and superior to the mutational features. The top 45 features not only contain mutational features but also three other types of features. Therefore, our study extends the detection of cancer driver genes and provides a more comprehensive understanding of cancer mechanisms.
癌症是一种由基因组和表观遗传改变引起的复杂疾病;因此,鉴定有意义的癌症驱动基因是一项重要且具有挑战性的任务。大多数研究都检测到了具有突变特征的癌症驱动基因,而少数研究则考虑了多种组学特征作为重要因素。在本研究中,我们提出了一种分析多组学特征对驱动基因识别影响的框架。我们利用该框架中的四种机器学习算法,在泛癌症数据中检测了包括 19636 个基因中的 75 个特征在内的癌症驱动基因。这 75 个特征分为四种类型,并基于 CGC 基因和非 CGC 基因使用基于 Kullback-Leibler 散度的方法进行分析。我们通过两种不同的方法来检测癌症驱动基因。一种是从单一特征类型中检测驱动基因,另一种是从前 N 个特征中检测。第一个分析表明,突变特征是最好的特征。第二个分析表明,前 45 个特征是最有效的特征组合,优于突变特征。前 45 个特征不仅包含突变特征,还包含另外三种类型的特征。因此,我们的研究扩展了癌症驱动基因的检测,并提供了对癌症机制的更全面理解。