The Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK.
BMC Bioinformatics. 2013 Dec 12;14:359. doi: 10.1186/1471-2105-14-359.
DNA methylation is indispensible for normal human genome function. Currently there is an increasingly large number of DNA methylomic data being released in the public domain allowing for an opportunity to investigate the relationships between the DNA methylome, genome function, and human phenotypes. The Illumina450K is one of the most popular platforms for assessing DNA methylation with over 10,000 samples available in the public domain. However, accessing all this data requires downloading each individual experiment and due to inconsistent annotation, accessing the right data can be a challenge.
Here we introduce 'Marmal-aid', the first standardised database for DNA methylation (freely available at http://marmal-aid.org). In Marmal-aid, the majority of publicly available Illumina HumanMethylation450 data is incorporated into a single repository allowing for re-processing of data including normalisation and imputation of missing values. The database is accessible in two ways: (1) Using an R package to allow for incorporation into existing analysis pipelines which can then be easily queried to gain insight into the functionality of certain CpG sites. This is aimed at a bioinformatician with experience in R. (2) Using a graphical interface allowing general biologists to query a pre-defined set of tissues (currently 15) providing a reference database of the methylation state in these tissues for the 450,000 CpG sites profiled by the Illumina HumanMethylation450.
Marmal-aid is the largest publicly available Illumina HumanMethylation450 methylation database combining Illumina HumanMethylation450 data from a number of sources into a single location with a single common annotation format. This allows for automated extraction using the R package and inclusion into existing analysis pipelines. Marmal-aid also provides a easy to use GUI to visualise methylation data in user defined genomic regions for various reference tissues.
DNA 甲基化对于正常的人类基因组功能是不可或缺的。目前,越来越多的甲基化基因组数据在公共领域中被释放,这为研究 DNA 甲基组、基因组功能和人类表型之间的关系提供了机会。Illumina450K 是评估 DNA 甲基化的最受欢迎的平台之一,在公共领域中已有超过 10000 个样本可用。然而,访问所有这些数据需要下载每个单独的实验,由于注释不一致,访问正确的数据可能是一个挑战。
在这里,我们介绍了“Marmal-aid”,这是第一个用于 DNA 甲基化的标准化数据库(可在 http://marmal-aid.org 上免费获得)。在 Marmal-aid 中,大多数可公开获得的 Illumina HumanMethylation450 数据都被整合到一个单一的存储库中,允许对数据进行重新处理,包括标准化和缺失值的插补。该数据库有两种访问方式:(1)使用 R 包,以便将其纳入现有的分析管道中,然后可以轻松查询这些管道,以深入了解某些 CpG 位点的功能。这是针对有 R 经验的生物信息学家的。(2)使用图形界面,允许普通生物学家查询预先定义的一组组织(目前为 15 个),为 450000 个 CpG 位点提供这些组织的甲基化状态参考数据库,这些 CpG 位点是由 Illumina HumanMethylation450 进行分析的。
Marmal-aid 是最大的公共可用的 Illumina HumanMethylation450 甲基化数据库,它将来自多个来源的 Illumina HumanMethylation450 数据整合到一个位置,并采用单一的通用注释格式。这允许使用 R 包进行自动提取,并纳入现有的分析管道。Marmal-aid 还提供了一个易于使用的图形用户界面,用于可视化用户定义的基因组区域内各种参考组织的甲基化数据。