Yuan Dongsheng, Jugas Robin, Pokorna Petra, Sterba Jaroslav, Slaby Ondrej, Schmid Simone, Siewert Christin, Osberg Brendan, Capper David, Halldorsson Skarphedinn, Vik-Mo Einar O, Zeiner Pia S, Weber Katharina J, Harter Patrick N, Thomas Christian, Albers Anne, Rechsteiner Markus, Reimann Regina, Appelt Anton, Schüller Ulrich, Jabareen Nabil, Mackowiak Sebastian, Ishaque Naveed, Eils Roland, Lukassen Sören, Euskirchen Philipp
Department of Experimental Neurology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany.
Center of Digital Health, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany.
Nat Cancer. 2025 Jun 6. doi: 10.1038/s43018-025-00976-5.
DNA methylation-based classification of (brain) tumors has emerged as a powerful and indispensable diagnostic technique. Initial implementations used methylation microarrays for data generation, while most current classifiers rely on a fixed methylation feature space. This makes them incompatible with other platforms, especially different flavors of DNA sequencing. Here, we describe crossNN, a neural network-based machine learning framework that can accurately classify tumors using sparse methylomes obtained on different platforms and with different epigenome coverage and sequencing depth. It outperforms other deep and conventional machine learning models regarding accuracy and computational requirements while still being explainable. We use crossNN to train a pan-cancer classifier that can discriminate more than 170 tumor types across all organ sites. Validation in more than 5,000 tumors profiled on different platforms, including nanopore and targeted bisulfite sequencing, demonstrates its robustness and scalability with 99.1% and 97.8% precision for the brain tumor and pan-cancer models, respectively.
基于DNA甲基化的(脑)肿瘤分类已成为一种强大且不可或缺的诊断技术。最初的实现使用甲基化微阵列来生成数据,而目前大多数分类器依赖于固定的甲基化特征空间。这使得它们与其他平台不兼容,尤其是不同类型的DNA测序。在此,我们描述了crossNN,一种基于神经网络的机器学习框架,它可以使用在不同平台上获得的、具有不同表观基因组覆盖度和测序深度的稀疏甲基化组来准确地对肿瘤进行分类。在准确性和计算需求方面,它优于其他深度和传统机器学习模型,同时仍然具有可解释性。我们使用crossNN训练了一个泛癌分类器,该分类器可以区分所有器官部位的170多种肿瘤类型。在包括纳米孔和靶向亚硫酸氢盐测序在内的不同平台上分析的5000多个肿瘤中进行验证,结果表明其稳健性和可扩展性,脑肿瘤模型和泛癌模型的精度分别为99.1%和97.8%。