Suppr超能文献

一种用于无参考变异检测的可扩展分布式流程。

A scalable distributed pipeline for reference-free variants calling.

作者信息

Di Rocco Lorenzo, Ferraro Petrillo Umberto

机构信息

Department of Statistical Sciences, Sapienza University of Rome, Rome, Italy.

出版信息

BMC Genomics. 2025 Jun 3;26(Suppl 1):557. doi: 10.1186/s12864-025-11722-7.

Abstract

BACKGROUND

Precision medicine pipelines typically begin with variant calling to identify disease-related mutations for optimal treatment selection. Reference-free approaches assess variations in the genetic profiles of distinct individuals through the utilization of a De Bruijn graph. However, the timely analysis of large-scale sequencing data may be beyond the capabilities of single workstations, requiring alternative computational approaches.

RESULTS

We introduce the first-known distributed pipeline for detecting isolated SNPs (Single Nucleotide Polymorphisms), by leveraging the computational resources of multiple machines in parallel. Our pipeline efficiently analyzes large datasets thanks to the usage of a distributed De Bruijn graph representation. Furthermore, we introduce a cluster-driven algorithm to partition the De Bruijn graph across multiple independent machines according to the inner structure of the sequences under analysis, thus further improving the scalability of our pipeline.

CONCLUSIONS

The results of our experiments, conducted on real-world datasets, show the good performance of our pipeline in terms of efficiency, output quality and scalability. Moreover, the reported results also confirm that the adoption of a specialized partitioning algorithm for the distributed representation of the De Bruijn graph leads to a relevant performance speed-up compared to using standard partitioning techniques.

摘要

背景

精准医学流程通常始于变异检测,以识别疾病相关突变,从而进行最佳治疗选择。无参考方法通过利用德布鲁因图来评估不同个体基因图谱中的变异。然而,对大规模测序数据进行及时分析可能超出单个工作站的能力范围,这就需要其他计算方法。

结果

我们通过并行利用多台机器的计算资源,引入了首个已知的用于检测孤立单核苷酸多态性(SNP)的分布式流程。由于使用了分布式德布鲁因图表示法,我们的流程能够高效分析大型数据集。此外,我们引入了一种集群驱动算法,根据所分析序列的内部结构在多个独立机器之间划分德布鲁因图,从而进一步提高我们流程的可扩展性。

结论

我们在真实世界数据集上进行的实验结果表明,我们的流程在效率、输出质量和可扩展性方面表现良好。此外,报告结果还证实,与使用标准划分技术相比,采用专门的划分算法对德布鲁因图进行分布式表示可显著提高性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a5b/12131334/51970992f0a0/12864_2025_11722_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验