Wong Matthew, Liew Bryan, Hum Melissa, Lee Ning Yuan, Lee Ann S G
Division of Cellular and Molecular Research, Humphrey Oei Institute of Cancer Research, National Cancer Centre Singapore, 30 Hospital Boulevard, Singapore, 168583, Singapore.
SingHealth Duke-NUS Oncology Academic Clinical Programme (ONCO ACP), Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore.
Sci Rep. 2025 Apr 21;15(1):13697. doi: 10.1038/s41598-025-97047-7.
Accurate variant calling from whole-exome sequencing (WES) data is vital for understanding genetic diseases. Recently, commercial variant calling software have emerged that do not require bioinformatics or programming expertise, hence enabling independent analysis of WES data by smaller laboratories and clinics and circumventing the need for dedicated and expensive computers and bioinformatics staff. This study benchmarks four non-programming variant calling software namely, Illumina BaseSpace Sequence Hub (Illumina), CLC Genomics Workbench (CLC), Partek Flow, and Varsome Clinical, for the variant calling of three Genome in a Bottle (GIAB) whole-exome sequencing datasets (HG001, HG002 and HG003). Following alignment of sequence reads to the human reference genome GRCh38, variants were compared against high-confidence regions from GIAB datasets and assessed using the Variant Calling Assessment Tool (VCAT). Illumina's DRAGEN Enrichment achieved the highest precision and recall scores for single nucleotide variant (SNV) and insertions/deletion (indel) calling at over 99% for SNVs and 96% for indels while Partek Flow using unionised variant calls from Freebayes and Samtools had the lowest indel calling performance. Illumina had the highest true positives (TP) variant counts for all samples and all four software shared 98-99% similarity of TP variants. Run times were shortest for CLC and Illumina ranging from 6 to 25 min and 29 to 36 min respectively, while Partek Flow took the longest (3.6 to 29.7 h). This study provides information for clinicians and biologists without programming expertise in their selection of software for variant analysis that balance accuracy, sensitivity, and runtime.
从全外显子组测序(WES)数据中准确地进行变异检测对于理解遗传疾病至关重要。最近,出现了一些商业变异检测软件,这些软件不需要生物信息学或编程专业知识,因此较小的实验室和诊所能够独立分析WES数据,无需使用专用且昂贵的计算机和生物信息学人员。本研究对四款无需编程的变异检测软件进行了基准测试,即Illumina BaseSpace Sequence Hub(Illumina)、CLC Genomics Workbench(CLC)、Partek Flow和Varsome Clinical,用于对三个基因组在瓶(GIAB)全外显子组测序数据集(HG001、HG002和HG003)进行变异检测。在将序列读数比对到人类参考基因组GRCh38之后,将变异与GIAB数据集中的高置信度区域进行比较,并使用变异检测评估工具(VCAT)进行评估。Illumina的DRAGEN富集在单核苷酸变异(SNV)和插入/缺失(indel)检测方面实现了最高的精度和召回率得分,SNV超过99%,indel为96%,而使用来自Freebayes和Samtools的未合并变异调用的Partek Flow的indel检测性能最低。Illumina在所有样本中具有最高的真阳性(TP)变异计数,并且所有四款软件的TP变异相似度为98 - 99%。运行时间最短的是CLC和Illumina,分别为6至25分钟和29至36分钟,而Partek Flow花费的时间最长(3.6至29.7小时)。本研究为没有编程专业知识的临床医生和生物学家在选择用于变异分析的软件时提供了信息,这些软件在准确性、敏感性和运行时间之间取得平衡。
BMC Bioinformatics. 2019-6-17
BMC Bioinformatics. 2016-10-3
Cancer Discov. 2024-1-12
Cell Genom. 2022-5
Genome Med. 2020-10-26