MRC-University of Glasgow Centre for Virus Research, Glasgow, Scotland, UK.
BMC Bioinformatics. 2018 Dec 18;19(1):532. doi: 10.1186/s12859-018-2459-9.
Virus genome sequences, generated in ever-higher volumes, can provide new scientific insights and inform our responses to epidemics and outbreaks. To facilitate interpretation, such data must be organised and processed within scalable computing resources that encapsulate virology expertise. GLUE (Genes Linked by Underlying Evolution) is a data-centric bioinformatics environment for building such resources. The GLUE core data schema organises sequence data along evolutionary lines, capturing not only nucleotide data but associated items such as alignments, genotype definitions, genome annotations and motifs. Its flexible design emphasises applicability to different viruses and to diverse needs within research, clinical or public health contexts.
HCV-GLUE is a case study GLUE resource for hepatitis C virus (HCV). It includes an interactive public web application providing sequence analysis in the form of a maximum-likelihood-based genotyping method, antiviral resistance detection and graphical sequence visualisation. HCV sequence data from GenBank is categorised and stored in a large-scale sequence alignment which is accessible via web-based queries. Whereas this web resource provides a range of basic functionality, the underlying GLUE project can also be downloaded and extended by bioinformaticians addressing more advanced questions.
GLUE can be used to rapidly develop virus sequence data resources with public health, research and clinical applications. This streamlined approach, with its focus on reuse, will help realise the full value of virus sequence data.
病毒基因组序列的数量不断增加,为我们提供了新的科学见解,并指导我们应对疫情和爆发。为了便于解释,此类数据必须在可扩展的计算资源中进行组织和处理,这些资源包含病毒学专业知识。GLUE(通过潜在进化联系的基因)是一个用于构建此类资源的数据中心型生物信息学环境。GLUE 核心数据模式沿着进化线组织序列数据,不仅捕获核苷酸数据,还捕获了诸如比对、基因型定义、基因组注释和基序等相关项。其灵活的设计强调了它对不同病毒的适用性以及在研究、临床或公共卫生环境中的不同需求。
HCV-GLUE 是针对丙型肝炎病毒(HCV)的案例研究 GLUE 资源。它包括一个交互式公共网络应用程序,以基于最大似然的基因分型方法、抗病毒药物耐药性检测和图形序列可视化的形式提供序列分析。来自 GenBank 的 HCV 序列数据经过分类和存储在大规模序列比对中,可以通过基于网络的查询访问。虽然这个网络资源提供了一系列基本功能,但底层的 GLUE 项目也可以被生物信息学家下载和扩展,以解决更高级的问题。
GLUE 可用于快速开发具有公共卫生、研究和临床应用的病毒序列数据资源。这种简化的方法侧重于重用,将有助于充分发挥病毒序列数据的价值。