Zhao Zhi-Yang, Huang Chang-Ling, Wang Tong-Min, Zhou Shi-Hao, Pei Lu, Jia Wen-Hui, Jia Wei-Hua
School of Public Health, Sun Yat-sen University, Guangzhou 510080, China.
Sun Yat-sen University Cancer Center, Guangzhou 510060, China.
Diagnostics (Basel). 2025 May 1;15(9):1156. doi: 10.3390/diagnostics15091156.
The accurate discrimination between patients with and without cancer using their cell-free DNA (cfDNA) is crucial for early cancer diagnosis. The end-motifs of cfDNA serve as significant cancer biomarkers, offering compelling prospects for cancer diagnosis. This study proposes EM-DeepSD, a signal decomposition deep learning framework based on cfDNA end-motifs, which is aimed at improving the accuracy of cancer diagnosis and adapting to different sequencing modalities. This study included 146 patients diagnosed with cancer and 122 non-cancer controls. EM-DeepSD comprises three core modules. Initially, it utilizes a signal decomposition module to decompose and reconstruct the input end-motif profiles, thereby generating multiple regular subsequences that optimize the subsequent modeling process. Subsequently, both a machine learning module and a deep learning module are employed to improve the accuracy of cancer diagnosis. Furthermore, this paper compares the performance of EM-DeepSD with that of existing benchmarked methods to demonstrate its superiority. Based on the EM-DeepSD framework, we developed the EM-DeepSSA model and compared it with two benchmarked methods across different cfDNA sequencing datasets. In the internal validation set, EM-DeepSSA outperformed the two benchmark methods for cancer diagnosis (area under the curve (AUC), 0.920; adjusted value < 0.05). Meanwhile, EM-DeepSSA also exhibited the best performance on two independent external testing sets that were subjected to 5-hydroxymethylcytosine sequencing (5hmCS) and broad-range cell-free DNA sequencing (BR-cfDNA-Seq), respectively (test set-1: AUC = 0.933; test set-2: AUC = 0.956; adjusted value < 0.05). In summary, we present a new framework which can achieve high classification performance in cancer diagnosis and which is applicable to different sequencing modalities.
利用游离DNA(cfDNA)准确区分癌症患者和非癌症患者对于早期癌症诊断至关重要。cfDNA的末端基序可作为重要的癌症生物标志物,为癌症诊断提供了令人信服的前景。本研究提出了EM-DeepSD,这是一种基于cfDNA末端基序的信号分解深度学习框架,旨在提高癌症诊断的准确性并适应不同的测序模式。 本研究纳入了146例确诊为癌症的患者和122例非癌症对照。EM-DeepSD包括三个核心模块。首先,它利用信号分解模块对输入的末端基序谱进行分解和重构,从而生成多个规则子序列,优化后续的建模过程。随后,同时使用机器学习模块和深度学习模块来提高癌症诊断的准确性。此外,本文将EM-DeepSD的性能与现有基准方法进行比较,以证明其优越性。基于EM-DeepSD框架,我们开发了EM-DeepSSA模型,并在不同的cfDNA测序数据集上与两种基准方法进行了比较。 在内部验证集中,EM-DeepSSA在癌症诊断方面优于两种基准方法(曲线下面积(AUC),0.920;校正 值<0.05)。同时,EM-DeepSSA在分别进行5-羟甲基胞嘧啶测序(5hmCS)和宽范围游离DNA测序(BR-cfDNA-Seq)的两个独立外部测试集上也表现出最佳性能(测试集1:AUC = 0.933;测试集2:AUC = 0.956;校正 值<0.05)。 总之,我们提出了一个新的框架,该框架可以在癌症诊断中实现高分类性能,并且适用于不同的测序模式。