Wang Haichao, Mennea Paulius D, Chan Yu Kiu Elkie, Cheng Zhao, Neofytou Maria C, Surani Arif Anwer, Vijayaraghavan Aadhitthya, Ditter Emma-Jane, Bowers Richard, Eldridge Matthew D, Shcherbo Dmitry S, Smith Christopher G, Markowetz Florian, Cooper Wendy N, Kaplan Tommy, Rosenfeld Nitzan, Zhao Hui
Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
Cancer Research UK Cambridge Centre, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
Genome Biol. 2025 May 23;26(1):141. doi: 10.1186/s13059-025-03607-5.
Fragmentomics features of cell-free DNA represent promising non-invasive biomarkers for cancer diagnosis. A lack of systematic evaluation of biases in feature quantification hinders the adoption of such applications. We compare features derived from whole-genome sequencing of ten healthy donors using nine library kits and ten data-processing routes and validated in 1182 plasma samples from published studies. Our results clarify the variations from library preparation and feature quantification methods. We design the Trim Align Pipeline and cfDNAPro R package as unified interfaces for data pre-processing, feature extraction, and visualization to standardize multi-modal feature engineering and integration for machine learning.
游离DNA的片段组学特征代表了用于癌症诊断的有前景的非侵入性生物标志物。缺乏对特征量化偏差的系统评估阻碍了此类应用的采用。我们比较了使用九种文库试剂盒和十种数据处理路径从十名健康供体的全基因组测序中获得的特征,并在已发表研究的1182份血浆样本中进行了验证。我们的结果阐明了文库制备和特征量化方法的差异。我们设计了Trim Align Pipeline和cfDNAPro R包作为数据预处理、特征提取和可视化的统一接口,以标准化用于机器学习的多模态特征工程和集成。