Department of Biotechnology, Bhupat Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India.
Centre for Integrative Biology and Systems mEdicine (IBSE), Indian Institute of Technology Madras, Chennai, India.
Sci Rep. 2022 Jan 7;12(1):5. doi: 10.1038/s41598-021-04015-y.
An emergent area of cancer genomics is the identification of driver genes. Driver genes confer a selective growth advantage to the cell. While several driver genes have been discovered, many remain undiscovered, especially those mutated at a low frequency across samples. This study defines new features and builds a pan-cancer model, cTaG, to identify new driver genes. The features capture the functional impact of the mutations as well as their recurrence across samples, which helps build a model unbiased to genes with low frequency. The model classifies genes into the functional categories of driver genes, tumour suppressor genes (TSGs) and oncogenes (OGs), having distinct mutation type profiles. We overcome overfitting and show that certain mutation types, such as nonsense mutations, are more important for classification. Further, cTaG was employed to identify tissue-specific driver genes. Some known cancer driver genes predicted by cTaG as TSGs with high probability are ARID1A, TP53, and RB1. In addition to these known genes, potential driver genes predicted are CD36, ZNF750 and ARHGAP35 as TSGs and TAB3 as an oncogene. Overall, our approach surmounts the issue of low recall and bias towards genes with high mutation rates and predicts potential new driver genes for further experimental screening. cTaG is available at https://github.com/RamanLab/cTaG .
癌症基因组学的一个新兴领域是识别驱动基因。驱动基因赋予细胞选择性生长优势。虽然已经发现了一些驱动基因,但许多仍然未被发现,尤其是那些在样本中低频突变的基因。本研究定义了新的特征,并构建了一个泛癌模型 cTaG,用于识别新的驱动基因。这些特征捕捉了突变的功能影响及其在样本中的重现性,这有助于建立一个不受低频基因影响的模型。该模型将基因分类为驱动基因、肿瘤抑制基因(TSG)和癌基因(OG)的功能类别,具有不同的突变类型特征。我们克服了过拟合,并表明某些突变类型,如无义突变,对分类更为重要。此外,cTaG 还被用于识别组织特异性的驱动基因。cTaG 预测的一些已知的癌症驱动基因作为 TSGs 的概率很高,包括 ARID1A、TP53 和 RB1。除了这些已知的基因外,还预测了潜在的驱动基因 CD36、ZNF750 和 ARHGAP35 作为 TSGs,以及 TAB3 作为癌基因。总的来说,我们的方法克服了低召回率和偏向高突变率基因的问题,并预测了潜在的新驱动基因,以进行进一步的实验筛选。cTaG 可在 https://github.com/RamanLab/cTaG 获得。