Oldenburg Jan, Wagner Jonas, Troschke-Meurer Sascha, Plietz Jessica, Kaderali Lars, Völzke Henry, Nauck Matthias, Homuth Georg, Völker Uwe, Simm Stefan
Institute of Bioinformatics, University Medicine Greifswald, 17475 Greifswald, Germany.
Institute for Bioanalysis, Department of Applied Sciences, Coburg University of Applied Sciences and Arts, 96450 Coburg, Germany.
Biomolecules. 2024 Nov 25;14(12):1501. doi: 10.3390/biom14121501.
The Explainable Modular Neural Network (XModNN) enables the identification of biomarkers, facilitating the classification of diseases and clinical parameters in transcriptomic datasets. The modules within XModNN represent specific pathways or genes of a functional hierarchy. The incorporation of biological insights into the architectural design reduced the number of parameters. This is further reinforced by the weighted multi-loss progressive training, which enables successful classification with a reduced number of replicates. The combination of this workflow with layer-wise relevance propagation ensures a robust post hoc explanation of the individual module contribution. Two use cases were employed to predict sex and neuroblastoma cell states, demonstrating that XModNN, in contrast to standard statistical approaches, results in a reduced number of candidate biomarkers. Moreover, the architecture enables the training on a limited number of examples, attaining the same performance and robustness as support vector machine and random forests. The integrated pathway relevance analysis improves a standard gene set overrepresentation analysis, which relies solely on gene assignment. Two crucial genes and three pathways were identified for sex classification, while 26 genes and six pathways are highly important to discriminate adrenergic-mesenchymal cell states in neuroblastoma cancer.
可解释模块化神经网络(XModNN)能够识别生物标志物,有助于在转录组数据集中对疾病和临床参数进行分类。XModNN中的模块代表功能层次结构的特定途径或基因。将生物学见解纳入架构设计减少了参数数量。加权多损失渐进训练进一步强化了这一点,该训练能够以减少的重复次数成功进行分类。此工作流程与逐层相关性传播相结合,确保了对各个模块贡献的强大事后解释。使用了两个用例来预测性别和成神经细胞瘤细胞状态,表明与标准统计方法相比,XModNN减少了候选生物标志物的数量。此外,该架构能够在有限数量的示例上进行训练,达到与支持向量机和随机森林相同的性能和鲁棒性。集成的途径相关性分析改进了仅依赖基因分配的标准基因集过度表达分析。确定了两个关键基因和三条途径用于性别分类,而26个基因和六条途径对于区分成神经细胞瘤中的肾上腺素能 - 间充质细胞状态非常重要。