Palizban Fahimeh, Sarbishegi Mohammadmahdi, Kavousi Kaveh, Mehrmohamadi Mahya
Laboratory of Complex Biological Systems and Bioinformatics (CBB), Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran.
Department of Biotechnology, College of Science, University of Tehran, Tehran, Iran.
Heliyon. 2024 Oct 15;10(20):e39379. doi: 10.1016/j.heliyon.2024.e39379. eCollection 2024 Oct 30.
MOTIVATION: Distinguishing between pathogenic cancer-associated mutations and other somatic variants present in cell-free DNA (cfDNA) is one of the challenges in the field of liquid biopsy. This distinction is critical, since the misclassification of mutations stemming from clonal hematopoiesis (CH) as tumor-derived and vice versa could result in inaccurate diagnoses and inappropriate therapeutic interventions for patients. RESULTS: We addressed this by developing a specialized machine learning technique to differentiate tumor- or CH-related mutations in cfDNA. We established a comprehensive in-house reference catalog, comprising approximately 25,000 single nucleotide variants (SNVs), each linked to either tumor or CH origin. This reference serves as a foundation for training a deep learning model, which is structured on the semi-supervised generative adversarial network (SSGAN) architecture. By analyzing genomic coordinates and nucleotide composition of cfDNA variants, our model attains 95 % area under the curve (AUC) in classifying uncharacterized variants as CH or tumor-derived. In conclusion, our research emphasizes the potential of genomic feature prediction, using cfDNA data, to stand as a robust alternative to conventional multi-analyte sequencing methods. This approach not only enhances the accuracy of distinguishing CH from tumor mutations in liquid biopsy data, but also highlights the potential of advanced data analysis techniques and machine learning in genomics and personalized medicine. : https://github.com/FPalizban/SSGAN.
动机:区分游离DNA(cfDNA)中与癌症相关的致病突变和其他体细胞变异是液体活检领域的挑战之一。这种区分至关重要,因为将源于克隆性造血(CH)的突变误分类为肿瘤来源的突变,反之亦然,可能导致对患者的诊断不准确和治疗干预不当。 结果:我们通过开发一种专门的机器学习技术来区分cfDNA中与肿瘤或CH相关的突变来解决这个问题。我们建立了一个全面的内部参考目录,包含大约25000个单核苷酸变异(SNV),每个变异都与肿瘤或CH起源相关。该参考作为训练深度学习模型的基础,该模型基于半监督生成对抗网络(SSGAN)架构构建。通过分析cfDNA变异的基因组坐标和核苷酸组成,我们的模型在将未表征的变异分类为CH或肿瘤来源时,曲线下面积(AUC)达到95%。总之,我们的研究强调了利用cfDNA数据进行基因组特征预测作为传统多分析物测序方法有力替代方案的潜力。这种方法不仅提高了在液体活检数据中区分CH与肿瘤突变的准确性,还突出了先进数据分析技术和机器学习在基因组学和个性化医学中的潜力。: https://github.com/FPalizban/SSGAN
Sensors (Basel). 2024-9-1
Cancers (Basel). 2020-8-14
Clin Cancer Res. 2018-3-22
Sci Transl Med. 2023-3-29
Nat Commun. 2022-7-23
Nucleic Acids Res. 2022-1-7
Blood Cancer Discov. 2021-5
Nat Genet. 2020-10-26