Zhang Xiaolei, Minikel Eric V, O'Donnell-Luria Anne H, MacArthur Daniel G, Ware James S, Weisburd Ben
National Heart and Lung Institute, Imperial College London, London, SW7 2AZ, UK.
Royal Brompton Cardiovascular Research Centre, Royal Brompton & Harefield Hospitals NHS Trust, London, SW3 6NP, UK.
Wellcome Open Res. 2017 May 23;2:33. doi: 10.12688/wellcomeopenres.11640.1. eCollection 2017.
This software repository provides a pipeline for converting raw ClinVar data files into analysis-friendly tab-delimited tables, and also provides these tables for the most recent ClinVar release. Separate tables are generated for genome builds GRCh37 and GRCh38 as well as for mono-allelic variants and complex multi-allelic variants. Additionally, the tables are augmented with allele frequencies from the ExAC and gnomAD datasets as these are often consulted when analyzing ClinVar variants. Overall, this work provides ClinVar data in a format that is easier to work with and can be directly loaded into a variety of popular analysis tools such as R, python pandas, and SQL databases.
该软件存储库提供了一个将原始ClinVar数据文件转换为便于分析的制表符分隔表的管道,并且还为ClinVar的最新版本提供了这些表格。针对基因组构建版本GRCh37和GRCh38以及单等位基因变体和复杂多等位基因变体生成了单独的表格。此外,这些表格还补充了来自ExAC和gnomAD数据集的等位基因频率,因为在分析ClinVar变体时经常会参考这些数据。总体而言,这项工作以一种更易于处理的格式提供了ClinVar数据,并且可以直接加载到各种流行的分析工具中,如R、Python的pandas和SQL数据库。