文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

使用金标准数据集对全外显子组测序的变异检测软件进行基准测试。

Benchmarking of variant calling software for whole-exome sequencing using gold standard datasets.

作者信息

Wong Matthew, Liew Bryan, Hum Melissa, Lee Ning Yuan, Lee Ann S G

机构信息

Division of Cellular and Molecular Research, Humphrey Oei Institute of Cancer Research, National Cancer Centre Singapore, 30 Hospital Boulevard, Singapore, 168583, Singapore.

SingHealth Duke-NUS Oncology Academic Clinical Programme (ONCO ACP), Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore.

出版信息

Sci Rep. 2025 Apr 21;15(1):13697. doi: 10.1038/s41598-025-97047-7.


DOI:10.1038/s41598-025-97047-7
PMID:40258889
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12012014/
Abstract

Accurate variant calling from whole-exome sequencing (WES) data is vital for understanding genetic diseases. Recently, commercial variant calling software have emerged that do not require bioinformatics or programming expertise, hence enabling independent analysis of WES data by smaller laboratories and clinics and circumventing the need for dedicated and expensive computers and bioinformatics staff. This study benchmarks four non-programming variant calling software namely, Illumina BaseSpace Sequence Hub (Illumina), CLC Genomics Workbench (CLC), Partek Flow, and Varsome Clinical, for the variant calling of three Genome in a Bottle (GIAB) whole-exome sequencing datasets (HG001, HG002 and HG003). Following alignment of sequence reads to the human reference genome GRCh38, variants were compared against high-confidence regions from GIAB datasets and assessed using the Variant Calling Assessment Tool (VCAT). Illumina's DRAGEN Enrichment achieved the highest precision and recall scores for single nucleotide variant (SNV) and insertions/deletion (indel) calling at over 99% for SNVs and 96% for indels while Partek Flow using unionised variant calls from Freebayes and Samtools had the lowest indel calling performance. Illumina had the highest true positives (TP) variant counts for all samples and all four software shared 98-99% similarity of TP variants. Run times were shortest for CLC and Illumina ranging from 6 to 25 min and 29 to 36 min respectively, while Partek Flow took the longest (3.6 to 29.7 h). This study provides information for clinicians and biologists without programming expertise in their selection of software for variant analysis that balance accuracy, sensitivity, and runtime.

摘要

从全外显子组测序(WES)数据中准确地进行变异检测对于理解遗传疾病至关重要。最近,出现了一些商业变异检测软件,这些软件不需要生物信息学或编程专业知识,因此较小的实验室和诊所能够独立分析WES数据,无需使用专用且昂贵的计算机和生物信息学人员。本研究对四款无需编程的变异检测软件进行了基准测试,即Illumina BaseSpace Sequence Hub(Illumina)、CLC Genomics Workbench(CLC)、Partek Flow和Varsome Clinical,用于对三个基因组在瓶(GIAB)全外显子组测序数据集(HG001、HG002和HG003)进行变异检测。在将序列读数比对到人类参考基因组GRCh38之后,将变异与GIAB数据集中的高置信度区域进行比较,并使用变异检测评估工具(VCAT)进行评估。Illumina的DRAGEN富集在单核苷酸变异(SNV)和插入/缺失(indel)检测方面实现了最高的精度和召回率得分,SNV超过99%,indel为96%,而使用来自Freebayes和Samtools的未合并变异调用的Partek Flow的indel检测性能最低。Illumina在所有样本中具有最高的真阳性(TP)变异计数,并且所有四款软件的TP变异相似度为98 - 99%。运行时间最短的是CLC和Illumina,分别为6至25分钟和29至36分钟,而Partek Flow花费的时间最长(3.6至29.7小时)。本研究为没有编程专业知识的临床医生和生物学家在选择用于变异分析的软件时提供了信息,这些软件在准确性、敏感性和运行时间之间取得平衡。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6325/12012014/53437cef54be/41598_2025_97047_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6325/12012014/ff96bb900c66/41598_2025_97047_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6325/12012014/3851cb35834c/41598_2025_97047_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6325/12012014/e30c344010e1/41598_2025_97047_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6325/12012014/ed8d1fc750ab/41598_2025_97047_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6325/12012014/53437cef54be/41598_2025_97047_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6325/12012014/ff96bb900c66/41598_2025_97047_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6325/12012014/3851cb35834c/41598_2025_97047_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6325/12012014/e30c344010e1/41598_2025_97047_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6325/12012014/ed8d1fc750ab/41598_2025_97047_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6325/12012014/53437cef54be/41598_2025_97047_Fig5_HTML.jpg

相似文献

[1]
Benchmarking of variant calling software for whole-exome sequencing using gold standard datasets.

Sci Rep. 2025-4-21

[2]
Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery.

BMC Genomics. 2022-2-22

[3]
Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data.

BMC Bioinformatics. 2019-6-17

[4]
Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers.

Sci Rep. 2019-6-27

[5]
Impact of post-alignment processing in variant discovery from whole exome data.

BMC Bioinformatics. 2016-10-3

[6]
Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays.

BMC Bioinformatics. 2021-2-24

[7]
Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment.

Sci Rep. 2022-12-13

[8]
A benchmarking study of individual somatic variant callers and voting-based ensembles for whole-exome sequencing.

Brief Bioinform. 2024-11-22

[9]
From Wet-Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing.

Hum Mutat. 2016-12

[10]
Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data.

Elife. 2024-10-10

引用本文的文献

[1]
A simplified hybrid capture approach retains high specificity and enables PCR-free workflow.

BMC Genomics. 2025-9-2

本文引用的文献

[1]
Quantifying the Expanding Landscape of Clinical Actionability for Patients with Cancer.

Cancer Discov. 2024-1-12

[2]
Evaluation of an optimized germline exomes pipeline using BWA-MEM2 and Dragen-GATK tools.

PLoS One. 2023

[3]
Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment.

Sci Rep. 2022-12-13

[4]
Benchmarking challenging small variants with linked and long reads.

Cell Genom. 2022-5

[5]
PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions.

Cell Genom. 2022-5-11

[6]
Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery.

BMC Genomics. 2022-2-22

[7]
Current advances in prognostic and diagnostic biomarkers for solid cancers: Detection techniques and future challenges.

Biomed Pharmacother. 2022-2

[8]
Accuracy and efficiency of germline variant calling pipelines for human genome data.

Sci Rep. 2020-11-19

[9]
Best practices for variant calling in clinical sequencing.

Genome Med. 2020-10-26

[10]
Recommendations for the use of next-generation sequencing (NGS) for patients with metastatic cancers: a report from the ESMO Precision Medicine Working Group.

Ann Oncol. 2020-11

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索