Röner Sebastian, Burkard Lea, Speicher Michael R, Kircher Martin
Berlin Institute of Health (BIH) at Charité-Universitätsmedizin Berlin, 10178 Berlin, Germany.
University of Potsdam, Institute for Biochemistry and Biology, 14469 Potsdam, Germany.
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae102.
Cell-free DNA (cfDNA), a broadly applicable biomarker commonly sourced from urine or blood, is extensively used for research and diagnostic applications. In various settings, genetic and epigenetic information is derived from cfDNA. However, a unified framework for its processing is lacking, limiting the universal application of innovative analysis strategies and the joining of data sets.
Here, we describe cfDNA UniFlow, a unified, standardized, and ready-to-use workflow for processing cfDNA samples. The workflow is written in Snakemake and can be scaled from stand-alone computers to cluster environments. It includes methods for processing raw genome sequencing data as well as specialized approaches for correcting sequencing errors, filtering, and quality control. Sophisticated methods for detecting copy number alterations and estimating and correcting GC-related biases are readily incorporated. Furthermore, it includes methods for extracting, normalizing, and visualizing coverage signals around user-defined regions in case-control settings. Ultimately, all results and metrics are aggregated in a unified report, enabling easy access to a wide variety of information for further research and downstream analysis.
We provide an automated pipeline for processing cell-free DNA sampled from liquid biopsies, including a wide variety of additional functionalities like bias correction and signal extraction. With our focus on scalability and extensibility, we provide a foundation for future cfDNA research and faster clinical applications. The source code and extensive documentation are available on our GitHub repository (https://github.com/kircherlab/cfDNA-UniFlow).
游离DNA(cfDNA)是一种广泛应用的生物标志物,通常来源于尿液或血液,被广泛用于研究和诊断应用。在各种情况下,可从cfDNA中获取遗传和表观遗传信息。然而,目前缺乏一个统一的cfDNA处理框架,这限制了创新分析策略的普遍应用以及数据集的整合。
在此,我们描述了cfDNA UniFlow,这是一种用于处理cfDNA样本的统一、标准化且即用型的工作流程。该工作流程用Snakemake编写,可从单机扩展到集群环境。它包括处理原始基因组测序数据的方法以及用于校正测序错误、过滤和质量控制的专门方法。还可轻松纳入用于检测拷贝数改变以及估计和校正GC相关偏差的复杂方法。此外,它还包括在病例对照研究中提取、标准化和可视化用户定义区域周围覆盖信号的方法。最终,所有结果和指标都汇总在一份统一的报告中,便于获取各种信息以进行进一步研究和下游分析。
我们提供了一个用于处理从液体活检中采集的游离DNA的自动化流程,包括多种附加功能,如偏差校正和信号提取。我们专注于可扩展性和可扩展性,为未来的cfDNA研究和更快的临床应用奠定了基础。源代码和详细文档可在我们的GitHub仓库(https://github.com/kircherlab/cfDNA-UniFlow)上获取。