Otto Raik, Detjen Katharina M, Riemer Pamela, Fattohi Melanie, Grötzinger Carsten, Rindi Guido, Wiedenmann Bertram, Sers Christine, Leser Ulf
Knowledge Management in Bioinformatics, Institute for Computer Science, Humboldt-Universität zu Berlin, 10099 Berlin, Germany.
Department of Hepatology and Gastroenterology, Charité-Universitätsmedizin Berlin, Campus Virchow-Klinikum and Campus Charité Mitte, 13353 Berlin, Germany.
Cancers (Basel). 2023 Feb 1;15(3):936. doi: 10.3390/cancers15030936.
Pancreatic neuroendocrine neoplasms (panNENs) are a rare yet diverse type of neoplasia whose precise clinical-pathological classification is frequently challenging. Since incorrect classifications can affect treatment decisions, additional tools which support the diagnosis, such as machine learning (ML) techniques, are critically needed but generally unavailable due to the scarcity of suitable ML training data for rare panNENs. Here, we demonstrate that a multi-step ML framework predicts clinically relevant panNEN characteristics while being exclusively trained on widely available data of a healthy origin. The approach classifies panNENs by deconvolving their transcriptomes into cell type proportions based on shared gene expression profiles with healthy pancreatic cell types. The deconvolution results were found to provide a prognostic value with respect to the prediction of the overall patient survival time, neoplastic grading, and carcinoma versus tumor subclassification. The performance with which a proliferation rate agnostic deconvolution ML model could predict the clinical characteristics was found to be comparable to that of a comparative baseline model trained on the proliferation rate-informed levels. The approach is novel in that it complements established proliferation rate-oriented classification schemes whose results can be reproduced and further refined by differentiating between identically graded subgroups. By including non-endocrine cell types, the deconvolution approach furthermore provides an in silico quantification of panNEN dedifferentiation, optimizing it for challenging clinical classification tasks in more aggressive panNEN subtypes.
胰腺神经内分泌肿瘤(panNENs)是一种罕见但类型多样的肿瘤,其精确的临床病理分类常常具有挑战性。由于错误分类会影响治疗决策,因此迫切需要支持诊断的额外工具,如机器学习(ML)技术,但由于罕见panNENs的合适ML训练数据稀缺,这些工具通常无法获得。在此,我们证明了一个多步骤ML框架能够预测临床相关的panNEN特征,同时仅在广泛可用的健康来源数据上进行训练。该方法通过基于与健康胰腺细胞类型共享的基因表达谱将panNENs的转录组反卷积为细胞类型比例来对其进行分类。发现反卷积结果在预测患者总生存时间、肿瘤分级以及癌与肿瘤亚分类方面具有预后价值。发现一种不考虑增殖率的反卷积ML模型预测临床特征的性能与基于增殖率信息水平训练的比较基线模型相当。该方法的新颖之处在于它补充了既定的以增殖率为导向的分类方案,其结果可以通过区分分级相同的亚组来再现和进一步完善。通过纳入非内分泌细胞类型,反卷积方法还提供了panNEN去分化的计算机定量,为更具侵袭性的panNEN亚型中具有挑战性的临床分类任务进行了优化。