Guerra-Assunção José Afonso, Conde Lucia, Moghul Ismail, Webster Amy P, Ecker Simone, Chervova Olga, Chatzipantsiou Christina, Prieto Pablo P, Beck Stephan, Herrero Javier
Infection and Immunity, University College London, London, United Kingdom.
Bill Lyons Informatics Centre, UCL Cancer Institute, University College London, London, United Kingdom.
Front Genet. 2020 Sep 24;11:518644. doi: 10.3389/fgene.2020.518644. eCollection 2020.
In recent years, there has been a significant increase in whole genome sequencing data of individual genomes produced by research projects as well as direct to consumer service providers. While many of these sources provide their users with an interpretation of the data, there is a lack of free, open tools for generating reports exploring the data in an easy to understand manner. GenomeChronicler was developed as part of the Personal Genome Project UK (PGP-UK) to address this need. PGP-UK provides genomic, transcriptomic, epigenomic and self-reported phenotypic data under an open-access model with full ethical approval. As a result, the reports generated by GenomeChronicler are intended for research purposes only and include information relating to potentially beneficial and potentially harmful variants, but without clinical curation. GenomeChronicler can be used with data from whole genome or whole exome sequencing, producing a genome report containing information on variant statistics, ancestry and known associated phenotypic traits. Example reports are available from the PGP-UK data page (personalgenomes.org.uk/data). The objective of this method is to leverage existing resources to find known phenotypes associated with the genotypes detected in each sample. The provided trait data is based primarily upon information available in SNPedia, but also collates data from ClinVar, GETevidence, and gnomAD to provide additional details on potential health implications, presence of genotype in other PGP participants and population frequency of each genotype. The analysis can be run in a self-contained environment without requiring internet access, making it a good choice for cases where privacy is essential or desired: any third party project can embed GenomeChronicler within their off-line safe-haven environments. GenomeChronicler can be run for one sample at a time, or in parallel making use of the Nextflow workflow manager. The source code is available from GitHub (https://github.com/PGP-UK/GenomeChronicler), container recipes are available for Docker and Singularity, as well as a pre-built container from SingularityHub (https://singularity-hub.org/collections/3664) enabling easy deployment in a variety of settings. Users without access to computational resources to run GenomeChronicler can access the software from the Lifebit CloudOS platform (https://lifebit.ai/cloudos) enabling the production of reports and variant calls from raw sequencing data in a scalable fashion.
近年来,研究项目以及直接面向消费者的服务提供商所产生的个体基因组全基因组测序数据显著增加。虽然这些数据源中的许多都为用户提供了数据解读,但缺乏免费、开放的工具来以易于理解的方式生成探索数据的报告。GenomeChronicler作为英国个人基因组计划(PGP-UK)的一部分而开发,以满足这一需求。PGP-UK在开放获取模式下提供基因组、转录组、表观基因组和自我报告的表型数据,并获得了全面的伦理批准。因此,GenomeChronicler生成的报告仅用于研究目的,包括与潜在有益和潜在有害变异相关的信息,但未经临床整理。GenomeChronicler可用于全基因组或全外显子测序数据,生成一份包含变异统计、血统和已知相关表型特征信息的基因组报告。示例报告可从PGP-UK数据页面(personalgenomes.org.uk/data)获取。该方法的目的是利用现有资源来查找与每个样本中检测到的基因型相关的已知表型。所提供的特征数据主要基于SNPedia中可用的信息,但也整理了来自ClinVar、GETevidence和gnomAD的数据,以提供有关潜在健康影响、其他PGP参与者中基因型的存在情况以及每个基因型的群体频率的更多详细信息。该分析可以在独立环境中运行,无需联网,这使其成为隐私至关重要或有需求的情况下的理想选择:任何第三方项目都可以将GenomeChronicler嵌入其离线安全环境中。GenomeChronicler可以一次运行一个样本,也可以利用Nextflow工作流管理器并行运行。源代码可从GitHub(https://github.com/PGP-UK/GenomeChronicler)获取,容器配方可用于Docker和Singularity,以及来自SingularityHub(https://singularity-hub.org/collections/3664)的预构建容器,便于在各种环境中部署。无法访问计算资源来运行GenomeChronicler的用户可以从Lifebit CloudOS平台(https://lifebit.ai/cloudos)访问该软件,从而能够以可扩展的方式从原始测序数据生成报告和变异调用。