Suppr超能文献

基于外显子测序数据的胚系 CNV 调用工具的基准测试

Benchmarking germline CNV calling tools from exome sequencing data.

机构信息

Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, Moscow, Russia.

Moscow Institute of Physics and Technology (National Research University), Dolgoprudny, Russia.

出版信息

Sci Rep. 2021 Jul 13;11(1):14416. doi: 10.1038/s41598-021-93878-2.

Abstract

Whole-exome sequencing is an attractive alternative to microarray analysis because of the low cost and potential ability to detect copy number variations (CNV) of various sizes (from 1-2 exons to several Mb). Previous comparison of the most popular CNV calling tools showed a high portion of false-positive calls. Moreover, due to a lack of a gold standard CNV set, the results are limited and incomparable. Here, we aimed to perform a comprehensive analysis of tools capable of germline CNV calling available at the moment using a single CNV standard and reference sample set. Compiling variants from previous studies with Bayesian estimation approach, we constructed an internal standard for NA12878 sample (pilot National Institute of Standards and Technology Reference Material) including 110,050 CNV or non-CNV exons. The standard was used to evaluate the performance of 16 germline CNV calling tools on the NA12878 sample and 10 correlated exomes as a reference set with respect to length distribution, concordance, and efficiency. Each algorithm had a certain range of detected lengths and showed low concordance with other tools. Most tools are focused on detection of a limited number of CNVs one to seven exons long with a false-positive rate below 50%. EXCAVATOR2, exomeCopy, and FishingCNV focused on detection of a wide range of variations but showed low precision. Upon unified comparison, the tools were not equivalent. The analysis performed allows choosing algorithms or ensembles of algorithms most suitable for a specific goal, e.g. population studies or medical genetics.

摘要

全外显子测序是一种很有吸引力的替代微阵列分析的方法,因为其成本低,并且有可能检测到各种大小的拷贝数变异(CNV)(从 1-2 个外显子到几个 Mb)。之前对最流行的 CNV 调用工具的比较表明,假阳性调用的比例很高。此外,由于缺乏金标准的 CNV 集,结果是有限的和不可比的。在这里,我们旨在使用单一的 CNV 标准和参考样本集,对目前可用的用于种系 CNV 调用的工具进行全面分析。通过贝叶斯估计方法,我们从之前的研究中编译变体,构建了一个内部标准,用于包括 110,050 个 CNV 或非-CNV 外显子的 NA12878 样本(试点国家标准与技术研究院参考材料)。该标准用于评估 16 种种系 CNV 调用工具在 NA12878 样本和 10 个相关外显子上的性能,作为参考集,涉及长度分布、一致性和效率。每个算法都有一定的检测长度范围,与其他工具的一致性较低。大多数工具都专注于检测有限数量的 1-7 个外显子长的 CNV,假阳性率低于 50%。EXCAVATOR2、exomeCopy 和 FishingCNV 专注于检测广泛的变异,但精度较低。在统一比较中,这些工具并不等效。执行的分析允许选择最适合特定目标的算法或算法集合,例如群体研究或医学遗传学。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99a4/8277855/63c3dac36268/41598_2021_93878_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验