Cui Wanxin, You Junjie, Jie Wenlong, Li Zihao, Peng Xiaoqing
Hunan Key Laboratory of Bioinformatics, School of Computer Science and Engineering, Central South University, LuShan Nan Road 932, Changsha, 410083, Hunan, China.
Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, LuShan Nan Road 932, Changsha, 410083, Hunan, China.
Methods. 2025 Sep;241:163-172. doi: 10.1016/j.ymeth.2025.05.013. Epub 2025 Jun 3.
The tissues-of-origin of circulating cell-free DNA (cfDNA) holds great promise for non-invasive diagnosing cancers, monitoring allograft rejection, and prenatal testing. Many features for inferring the tissues-of-origin of cfDNAs are being revealed from different angles, including genetics, epigenetics, and fragmentomics, with whole-genome sequencing (WGS) and whole-genome bisulfite sequencing (WGBS) data of cfDNA. However, it lacks integrative toolkits for automatically extracting the revealed features from the WGS and WGBS data of cfDNA samples. Here, we propose cfDNAFE, a comprehensive and easy-to-use python package for extracting multi-omics features from the aligned cfDNA sequencing data. It covers three aspects: cfDNA genetic features, cfDNA methylation features, and cfDNA fragmentation features, including 13 types of feature profiles. The genetic features include substitution mutations, mutation signatures and copy number variations. The methylation features are the proportions of methylated fragments, unmethylated fragments, and mixed methylated fragments on cell-type-specific markers. The fragmentation features related to the fragment sizes, end/breakpoint motifs, and nucleosome positions are also integrated. To verify the functions of cfDNAFE, we perform analysis on the WGS/WGBS data of cfDNA samples based on the feature profiles extracted by cfDNAFE. The comparison between the cfDNA samples of hepatocellular carcinoma (HCC) patients and normal controls suggests HCC cfDNA samples exhibit significant difference in fragment size related features and breakpoint/end motif patterns, and obtain significant higher OCF values in the liver-specific open regions than the health controls. Conclusively, cfDNAFE is a most comprehensive toolkit which covers the most features for inferring the tissues-of-origin of cfDNAs in existing studies up to date. It will facilitate researchers to build machine learning models for auxiliary diagnosis based on these features. Availability and implementation: https://github.com/Cuiwanxin1998/cfDNAFE.
循环游离DNA(cfDNA)的组织来源在非侵入性癌症诊断、监测同种异体移植排斥反应和产前检测方面具有巨大潜力。从不同角度揭示了许多用于推断cfDNA组织来源的特征,包括遗传学、表观遗传学和片段组学,这些特征来自cfDNA的全基因组测序(WGS)和全基因组亚硫酸氢盐测序(WGBS)数据。然而,目前缺乏用于从cfDNA样本的WGS和WGBS数据中自动提取已揭示特征的综合工具包。在此,我们提出了cfDNAFE,这是一个用于从比对后的cfDNA测序数据中提取多组学特征的全面且易于使用的Python包。它涵盖三个方面:cfDNA遗传特征、cfDNA甲基化特征和cfDNA片段化特征,包括13种特征谱。遗传特征包括替换突变、突变特征和拷贝数变异。甲基化特征是细胞类型特异性标记上甲基化片段、未甲基化片段和混合甲基化片段的比例。还整合了与片段大小、末端/断点基序和核小体位置相关的片段化特征。为了验证cfDNAFE的功能,我们基于cfDNAFE提取的特征谱对cfDNA样本的WGS/WGBS数据进行分析。肝细胞癌(HCC)患者与正常对照的cfDNA样本比较表明,HCC的cfDNA样本在片段大小相关特征和断点/末端基序模式上存在显著差异,并且在肝脏特异性开放区域获得的OCF值显著高于健康对照。总之,cfDNAFE是一个最全面的工具包,涵盖了迄今为止现有研究中用于推断cfDNA组织来源的最多特征。它将有助于研究人员基于这些特征构建用于辅助诊断的机器学习模型。可用性和实现方式:https://github.com/Cuiwanxin1998/cfDNAFE 。