Wagner Jonas, Oldenburg Jan, Nath Neetika, Simm Stefan
Institute of Bioinformatics, University Medicine Greifswald, 17475 Greifswald, Germany.
Institute of Bioanalysis, Department of Applied Sciences, Coburg University of Applied Sciences and Arts, 96450 Coburg, Germany.
Cancers (Basel). 2025 May 22;17(11):1731. doi: 10.3390/cancers17111731.
: The prediction of cancer types is primarily reliant on driver genes and their specific mutations. The advancement in novel omics technologies has led to the acquisition of additional genetic data. When integrated with artificial intelligence models, there is considerable potential for this to enhance the accuracy of cancer diagnosis. As mutational signatures can provide insights into repair mechanism malfunctions, they also have the potential for more accurate cancer diagnosis. : First, we compared unsupervised and supervised machine learning approaches to predict cancer types. We employed deep and artificial neural network architectures with an explainable component like layerwise relevance propagation to extract the most relevant features for the cancer-type prediction. Ten-fold cross-validation and an extensive grid search were used to optimize the neural network architecture using driver gene mutations, mutational signatures and topological mutation information as input. The PCAWG dataset was used as input to discriminate between 17 primary sites and 24 cancer types. : Overall, our approach showed that the most relevant mutation information to discriminate between cancer types is increased by >10% using the whole genome or intergenic and intronic genome regions instead of exome information. Furthermore, the most relevant features for most cancer types, except for two, are in the mutational signatures and not the topological mutation information. : Informative mutational signatures outperformed the prediction of cancer types in comparison to driver gene mutations and added a new layer of diagnostic information. As the degree of information within the mutational signatures is not solely based on the frequency of occurrence, it is even possible to separate cancer types from the same primary site by the different relevant mutations. Furthermore, the comparison of informative mutational signatures allowed the cancer-type assignment of specific impaired repair mechanisms.
癌症类型的预测主要依赖于驱动基因及其特定突变。新型组学技术的进步使得获取更多的遗传数据成为可能。当与人工智能模型相结合时,这在提高癌症诊断准确性方面具有巨大潜力。由于突变特征可以提供有关修复机制故障的见解,它们在更准确的癌症诊断方面也具有潜力。
首先,我们比较了无监督和有监督的机器学习方法来预测癌症类型。我们采用了具有可解释组件(如逐层相关性传播)的深度和人工神经网络架构,以提取与癌症类型预测最相关的特征。使用十倍交叉验证和广泛的网格搜索,以驱动基因突变、突变特征和拓扑突变信息作为输入来优化神经网络架构。PCAWG数据集用作输入,以区分17个主要部位和24种癌症类型。
总体而言,我们的方法表明,使用全基因组或基因间和内含子基因组区域而非外显子信息来区分癌症类型时,最相关的突变信息增加了10%以上。此外,除了两种癌症类型外,大多数癌症类型最相关的特征在于突变特征而非拓扑突变信息。
与驱动基因突变相比,信息丰富的突变特征在癌症类型预测方面表现更优,并增加了一层新的诊断信息。由于突变特征中的信息程度并非仅基于出现频率,甚至有可能通过不同的相关突变将来自同一主要部位的癌症类型区分开来。此外,对信息丰富的突变特征进行比较有助于确定特定受损修复机制的癌症类型。