• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于无参考变异检测的可扩展分布式流程。

A scalable distributed pipeline for reference-free variants calling.

作者信息

Di Rocco Lorenzo, Ferraro Petrillo Umberto

机构信息

Department of Statistical Sciences, Sapienza University of Rome, Rome, Italy.

出版信息

BMC Genomics. 2025 Jun 3;26(Suppl 1):557. doi: 10.1186/s12864-025-11722-7.

DOI:10.1186/s12864-025-11722-7
PMID:40461964
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12131334/
Abstract

BACKGROUND

Precision medicine pipelines typically begin with variant calling to identify disease-related mutations for optimal treatment selection. Reference-free approaches assess variations in the genetic profiles of distinct individuals through the utilization of a De Bruijn graph. However, the timely analysis of large-scale sequencing data may be beyond the capabilities of single workstations, requiring alternative computational approaches.

RESULTS

We introduce the first-known distributed pipeline for detecting isolated SNPs (Single Nucleotide Polymorphisms), by leveraging the computational resources of multiple machines in parallel. Our pipeline efficiently analyzes large datasets thanks to the usage of a distributed De Bruijn graph representation. Furthermore, we introduce a cluster-driven algorithm to partition the De Bruijn graph across multiple independent machines according to the inner structure of the sequences under analysis, thus further improving the scalability of our pipeline.

CONCLUSIONS

The results of our experiments, conducted on real-world datasets, show the good performance of our pipeline in terms of efficiency, output quality and scalability. Moreover, the reported results also confirm that the adoption of a specialized partitioning algorithm for the distributed representation of the De Bruijn graph leads to a relevant performance speed-up compared to using standard partitioning techniques.

摘要

背景

精准医学流程通常始于变异检测,以识别疾病相关突变,从而进行最佳治疗选择。无参考方法通过利用德布鲁因图来评估不同个体基因图谱中的变异。然而,对大规模测序数据进行及时分析可能超出单个工作站的能力范围,这就需要其他计算方法。

结果

我们通过并行利用多台机器的计算资源,引入了首个已知的用于检测孤立单核苷酸多态性(SNP)的分布式流程。由于使用了分布式德布鲁因图表示法,我们的流程能够高效分析大型数据集。此外,我们引入了一种集群驱动算法,根据所分析序列的内部结构在多个独立机器之间划分德布鲁因图,从而进一步提高我们流程的可扩展性。

结论

我们在真实世界数据集上进行的实验结果表明,我们的流程在效率、输出质量和可扩展性方面表现良好。此外,报告结果还证实,与使用标准划分技术相比,采用专门的划分算法对德布鲁因图进行分布式表示可显著提高性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a5b/12131334/db3f8fde9ea4/12864_2025_11722_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a5b/12131334/51970992f0a0/12864_2025_11722_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a5b/12131334/caba1a45809d/12864_2025_11722_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a5b/12131334/2b1455056431/12864_2025_11722_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a5b/12131334/bb709a69185c/12864_2025_11722_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a5b/12131334/8718471324c4/12864_2025_11722_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a5b/12131334/88b14eb1ef26/12864_2025_11722_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a5b/12131334/92e4cbabf835/12864_2025_11722_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a5b/12131334/db3f8fde9ea4/12864_2025_11722_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a5b/12131334/51970992f0a0/12864_2025_11722_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a5b/12131334/caba1a45809d/12864_2025_11722_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a5b/12131334/2b1455056431/12864_2025_11722_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a5b/12131334/bb709a69185c/12864_2025_11722_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a5b/12131334/8718471324c4/12864_2025_11722_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a5b/12131334/88b14eb1ef26/12864_2025_11722_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a5b/12131334/92e4cbabf835/12864_2025_11722_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a5b/12131334/db3f8fde9ea4/12864_2025_11722_Fig8_HTML.jpg

相似文献

1
A scalable distributed pipeline for reference-free variants calling.一种用于无参考变异检测的可扩展分布式流程。
BMC Genomics. 2025 Jun 3;26(Suppl 1):557. doi: 10.1186/s12864-025-11722-7.
2
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.用于构建大型双向 de Bruijn 图的高效并行和外核算法。
BMC Bioinformatics. 2010 Nov 15;11:560. doi: 10.1186/1471-2105-11-560.
3
ADS-HCSpark: A scalable HaplotypeCaller leveraging adaptive data segmentation to accelerate variant calling on Spark.ADS-HCSpark:一种可扩展的基于 Spark 的单倍型调用程序,利用自适应数据分段来加速变异调用。
BMC Bioinformatics. 2019 Feb 14;20(1):76. doi: 10.1186/s12859-019-2665-0.
4
Integrating long-range connectivity information into de Bruijn graphs.将长程连接信息整合到 de Bruijn 图中。
Bioinformatics. 2018 Aug 1;34(15):2556-2565. doi: 10.1093/bioinformatics/bty157.
5
Simplitigs as an efficient and scalable representation of de Bruijn graphs.Simplitigs 作为一种高效且可扩展的 de Bruijn 图表示方法。
Genome Biol. 2021 Apr 6;22(1):96. doi: 10.1186/s13059-021-02297-z.
6
deGSM: Memory Scalable Construction Of Large Scale de Bruijn Graph.deGSM:大规模 de Bruijn 图的可扩展存储构建。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2157-2166. doi: 10.1109/TCBB.2019.2913932. Epub 2021 Dec 8.
7
Challenges in exome analysis by LifeScope and its alternative computational pipelines.LifeScope及其替代计算流程在全外显子组分析中的挑战。
BMC Res Notes. 2015 Sep 7;8:421. doi: 10.1186/s13104-015-1385-4.
8
Cache Friendly Optimisation of de Bruijn Graph Based Local Re-Assembly in Variant Calling.基于 de Bruijn 图的局部重组装在变异调用中的缓存友好优化。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Jul-Aug;17(4):1125-1133. doi: 10.1109/TCBB.2018.2881975. Epub 2018 Nov 19.
9
Identifying and classifying trait linked polymorphisms in non-reference species by walking coloured de bruijn graphs.通过遍历有色 de bruijn 图鉴定和分类非参考物种中的性状关联多态性。
PLoS One. 2013;8(3):e60058. doi: 10.1371/journal.pone.0060058. Epub 2013 Mar 25.
10
RResolver: efficient short-read repeat resolution within ABySS.RResolver:AByss 内高效的短读重复序列解决工具。
BMC Bioinformatics. 2022 Jun 21;23(1):246. doi: 10.1186/s12859-022-04790-z.

本文引用的文献

1
DIAMIN: a software library for the distributed analysis of large-scale molecular interaction networks.DIAMIN:用于大规模分子相互作用网络分布式分析的软件库。
BMC Bioinformatics. 2022 Nov 11;23(1):474. doi: 10.1186/s12859-022-05026-w.
2
Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes.基于泛基因组的基因组推断可在广泛的变异类别中实现高效、准确的基因分型。
Nat Genet. 2022 Apr;54(4):518-525. doi: 10.1038/s41588-022-01043-w. Epub 2022 Apr 11.
3
Nebula: ultra-efficient mapping-free structural variant genotyper.
星云:超高效免图结构变异基因分型器。
Nucleic Acids Res. 2021 May 7;49(8):e47. doi: 10.1093/nar/gkab025.
4
Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs.Bifrost:彩色紧凑布隆图的高度并行构建和索引
Genome Biol. 2020 Sep 17;21(1):249. doi: 10.1186/s13059-020-02135-8.
5
DiscoSnp-RAD: de novo detection of small variants for RAD-Seq population genomics.DiscoSnp-RAD:用于RAD-Seq群体基因组学的小变异体从头检测
PeerJ. 2020 Jun 10;8:e9291. doi: 10.7717/peerj.9291. eCollection 2020.
6
MALVA: Genotyping by Mapping-free ALlele Detection of Known VAriants.MALVA:通过对已知变异进行无图谱等位基因检测进行基因分型。
iScience. 2019 Aug 30;18:20-27. doi: 10.1016/j.isci.2019.07.011. Epub 2019 Jul 12.
7
Toward fast and accurate SNP genotyping from whole genome sequencing data for bedside diagnostics.致力于从全基因组测序数据中快速准确地进行 SNP 基因分型,以实现床边诊断。
Bioinformatics. 2019 Feb 1;35(3):415-420. doi: 10.1093/bioinformatics/bty641.
8
Succinct colored de Bruijn graphs.简明彩色 de Bruijn 图。
Bioinformatics. 2017 Oct 15;33(20):3181-3187. doi: 10.1093/bioinformatics/btx067.
9
FASTdoop: a versatile and efficient library for the input of FASTA and FASTQ files for MapReduce Hadoop bioinformatics applications.FASTdoop:一个通用且高效的库,用于为MapReduce Hadoop生物信息学应用输入FASTA和FASTQ文件。
Bioinformatics. 2017 May 15;33(10):1575-1577. doi: 10.1093/bioinformatics/btx010.
10
Fast genotyping of known SNPs through approximate k-mer matching.通过近似k-mer匹配对已知单核苷酸多态性进行快速基因分型。
Bioinformatics. 2016 Sep 1;32(17):i538-i544. doi: 10.1093/bioinformatics/btw460.