Mouhou Elyas, Genty Fabien, El M'selmi Walid, Chouali Hanae, Zagury Jean-François, Le Clerc Sigrid, Proudhon Charlotte, Noirel Josselin
Laboratoire GBCM (EA7528), Conservatoire national des arts et métiers (CNAM), Paris, France.
Infotel Conseil, 13, rue Madeleine-Michelis, Neuilly-sur-Seine, France.
Sci Rep. 2025 Apr 15;15(1):12941. doi: 10.1038/s41598-024-82393-9.
Several studies have made it possible to envision a translational application of plasma DNA sequencing in cancer diagnosis and monitoring. However, the extremely low concentration of circulating tumour DNA (ctDNA) fragments among the total cell-free DNA (cfDNA) remains a formidable challenge to overcome and statistical models have yet to be improved enough to become of practical use. In this study, we set about appraising the predictive value of a variety of binary classification models based on cfDNA sequencing using fragmentation features extracted around transcription start sites (TSSs). We investigated (1) features summarising mapped fragment density around each TSS, (2) long non-coding RNA (lncRNA) genes versus coding genes and (3) selection criteria to generate gene classes to be assigned by the model. Given that, in healthy samples, most of the cfDNA comes from lymphomyeloid lineages, we could identify the model parametrisation with the best accuracy in those lineages using publicly available datasets of healthy patients' cfDNA. Our results show that (1) the way tissue-specific gene classes are defined matters more than what fragmentation features are included, and (2) in particular, lncRNAs are more tissue specific than coding genes and stand out in terms of both sensitivity and specificity in our results.
多项研究使人们能够设想血浆DNA测序在癌症诊断和监测中的转化应用。然而,在总游离DNA(cfDNA)中循环肿瘤DNA(ctDNA)片段的浓度极低,这仍然是一个需要克服的巨大挑战,而且统计模型尚未得到充分改进以实际应用。在本研究中,我们着手评估基于cfDNA测序的各种二元分类模型的预测价值,这些模型使用转录起始位点(TSS)周围提取的片段化特征。我们研究了:(1)总结每个TSS周围映射片段密度的特征;(2)长链非编码RNA(lncRNA)基因与编码基因;以及(3)生成模型要分配的基因类别的选择标准。鉴于在健康样本中,大多数cfDNA来自淋巴骨髓谱系,我们可以使用健康患者cfDNA的公开可用数据集,在这些谱系中确定具有最高准确性的模型参数化。我们的结果表明:(1)组织特异性基因类别的定义方式比所包含的片段化特征更重要;(2)特别是,lncRNA比编码基因更具组织特异性,并且在我们的结果中在敏感性和特异性方面都表现突出。