Suppr超能文献

FinaleToolkit:使用高速计算工具包加速无细胞DNA片段化分析

FinaleToolkit: Accelerating Cell-Free DNA Fragmentation Analysis with a High-Speed Computational Toolkit.

作者信息

Li James Wenhan, Bandaru Ravi, Liu Yaping

机构信息

Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611.

Robert H. Lurie Comprehensive Cancer Center of Northwestern University, Chicago, IL 60611.

出版信息

bioRxiv. 2024 Dec 31:2024.05.29.596414. doi: 10.1101/2024.05.29.596414.

Abstract

Cell-free DNA (cfDNA) fragmentation pattern represents a promising non-invasive biomarker for disease diagnosis and prognosis. Numerous fragmentation features, such as end motif and window protection score (WPS), have been characterized in cfDNA genomic sequencing. However, the analytical tools developed in these studies are often not released to the liquid biopsy community or are inefficient for genome-wide analysis in large datasets. To address this gap, we have developed FinaleToolkit, a fast and memory-efficient Python package designed to generate comprehensive fragmentation features from large cfDNA genomic sequencing data. For instance, FinaleToolkit can generate genome-wide WPS features from a ~100X cfDNA whole-genome sequencing (WGS) dataset with over 1 billion fragments in 1.2 hours, offering up to a ~50-fold increase in processing speed compared to original implementations in the same dataset. We have benchmarked FinaleToolkit against original approaches or implementations where possible, confirming its efficacy. Furthermore, FinaleToolkit enabled the genome-wide analysis of fragmentation patterns over arbitrary genomic intervals, significantly boosting the performance for cancer early detection. FinaleToolkit is open source and thoroughly documented with both command line interface and Python application programming interface (API) to facilitate its widespread adoption and use within the research community: https://github.com/epifluidlab/FinaleToolkit.

摘要

游离DNA(cfDNA)片段化模式是一种很有前景的用于疾病诊断和预后的非侵入性生物标志物。在cfDNA基因组测序中,已经对许多片段化特征进行了表征,例如末端基序和窗口保护分数(WPS)。然而,这些研究中开发的分析工具通常不会发布给液体活检领域,或者在大型数据集中进行全基因组分析时效率低下。为了填补这一空白,我们开发了FinaleToolkit,这是一个快速且内存高效的Python软件包,旨在从大型cfDNA基因组测序数据中生成全面的片段化特征。例如,FinaleToolkit可以在1.2小时内从一个包含超过10亿个片段的约100X cfDNA全基因组测序(WGS)数据集中生成全基因组WPS特征,与同一数据集中的原始实现相比,处理速度提高了约50倍。我们已尽可能将FinaleToolkit与原始方法或实现进行了基准测试,证实了其有效性。此外,FinaleToolkit能够对任意基因组区间的片段化模式进行全基因组分析,显著提高了癌症早期检测的性能。FinaleToolkit是开源的,并且通过命令行界面和Python应用程序编程接口(API)进行了全面记录,以促进其在研究社区中的广泛采用和使用:https://github.com/epifluidlab/FinaleToolkit。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98fe/11694816/c754b8e380fe/nihpp-2024.05.29.596414v2-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验