National Center for Clinical Laboratories, Beijing Hospital, National Center of Gerontology, Beijing, People's Republic of China; Graduate School, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, People's Republic of China.
Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Medical Sciences, Beijing, People's Republic of China.
J Mol Diagn. 2021 Mar;23(3):285-299. doi: 10.1016/j.jmoldx.2020.11.010. Epub 2020 Dec 18.
Next-generation sequencing is increasingly being adopted as a valuable method for the detection of somatic variants in clinical oncology. However, it is still challenging to reach a satisfactory level of robustness and standardization in clinical practice when using the currently available bioinformatics pipelines to detect variants from raw sequencing data. Moreover, appropriate reference data sets are lacking for clinical bioinformatics pipeline development, validation, and proficiency testing. Herein, we developed the Variant Benchmark tool (VarBen), an open-source software for variant simulation to generate customized reference data sets by directly editing the original sequencing reads. VarBen can introduce a variety of variants, including single-nucleotide variants, small insertions and deletions, and large structural variants, into targeted, exome, or whole-genome sequencing data, and can handle sequencing data from both the Illumina and Ion Torrent sequencing platforms. To demonstrate the feasibility and robustness of VarBen, we performed variant simulation on different sequencing data sets and compared the simulated variants with real-world data. The validation study showed that the simulated data are highly comparable to real-world data and that VarBen is a reliable tool for variant simulation. In addition, our collaborative study of somatic variant calling in 20 laboratories emphasizes the need for laboratories to evaluate their bioinformatics pipelines with customized reference data sets. VarBen may help users develop and validate their bioinformatics pipelines using locally generated sequencing data.
下一代测序技术越来越多地被用作检测临床肿瘤体细胞变异的有价值的方法。然而,当使用当前可用的生物信息学管道从原始测序数据中检测变异时,在临床实践中达到令人满意的稳健性和标准化水平仍然具有挑战性。此外,临床生物信息学管道开发、验证和能力测试缺乏适当的参考数据集。在此,我们开发了 Variant Benchmark 工具(VarBen),这是一种用于变异模拟的开源软件,可以通过直接编辑原始测序reads 生成定制的参考数据集。VarBen 可以将各种变异(包括单核苷酸变异、小插入和缺失以及大结构变异)引入靶向、外显子或全基因组测序数据中,并且可以处理来自 Illumina 和 Ion Torrent 测序平台的测序数据。为了证明 VarBen 的可行性和稳健性,我们在不同的测序数据集上进行了变异模拟,并将模拟的变异与真实世界的数据进行了比较。验证研究表明,模拟数据与真实世界数据高度可比,VarBen 是一种可靠的变异模拟工具。此外,我们在 20 个实验室中进行的体细胞变异calling 的合作研究强调了实验室需要使用定制的参考数据集来评估其生物信息学管道。VarBen 可以帮助用户使用本地生成的测序数据开发和验证其生物信息学管道。