School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.
International School of Cosmetics, School of Perfume and Aroma Technology, Shanghai Institute of Technology, Shanghai, China.
Interdiscip Sci. 2023 Sep;15(3):405-418. doi: 10.1007/s12539-023-00563-1. Epub 2023 May 29.
DNA methylation-based precision tumor early diagnostics is emerging as state-of-the-art technology that could capture early cancer signs 3 ~ 5 years in advance, even for clinically homogenous groups. Presently, the sensitivity of early detection for many tumors is ~ 30%, which needs significant improvement. Nevertheless, based on the genome-wide DNA methylation data, one could comprehensively characterize tumors' entire molecular genetic landscape and their subtle differences. Therefore, novel high-performance methods must be modeled by considering unbiased information using excessively available DNA methylation data. To fill this gap, we have designed a computational model involving a self-attention graph convolutional network and multi-class classification support vector machine to identify the 11 most common cancers using DNA methylation data. The self-attention graph convolutional network automatically learns key methylation sites in a data-driven way. Then, multi-tumor early diagnostics is realized by training a multi-class classification support vector machine based on the selected methylation sites. We evaluated our model's performance through several data sets of experiments, and our results demonstrate the effectiveness of the selected key methylation sites, which are highly relevant for blood diagnosis. The pipeline of the self-attention graph convolutional network based computational framework.
基于 DNA 甲基化的精准肿瘤早期诊断技术是一种新兴的前沿技术,它可以提前 3 到 5 年捕捉到癌症早期迹象,即使对于临床同质的群体也是如此。目前,许多肿瘤的早期检测灵敏度约为 30%,这需要显著提高。然而,基于全基因组 DNA 甲基化数据,可以全面描述肿瘤的整个分子遗传景观及其细微差异。因此,必须通过使用过多的可用 DNA 甲基化数据来考虑无偏信息来建立新型高性能方法。为了填补这一空白,我们设计了一个涉及自注意力图卷积网络和多类分类支持向量机的计算模型,以使用 DNA 甲基化数据识别 11 种最常见的癌症。自注意力图卷积网络以数据驱动的方式自动学习关键甲基化位点。然后,通过基于所选甲基化位点训练多类分类支持向量机来实现多肿瘤早期诊断。我们通过几个数据集的实验评估了我们模型的性能,结果表明所选关键甲基化位点的有效性,这些关键甲基化位点与血液诊断高度相关。基于自注意力图卷积网络的计算框架的工作流程。