Department of Biomedical Engineering and Environmental Sciences, National Tsing Hua University, Hsinchu, Taiwan.
BMC Bioinformatics. 2010 Aug 10;11:421. doi: 10.1186/1471-2105-11-421.
Over the past decade, gene expression microarray studies have greatly expanded our knowledge of genetic mechanisms of human diseases. Meta-analysis of substantial amounts of accumulated data, by integrating valuable information from multiple studies, is becoming more important in microarray research. However, collecting data of special interest from public microarray repositories often present major practical problems. Moreover, including low-quality data may significantly reduce meta-analysis efficiency.
M2DB is a human curated microarray database designed for easy querying, based on clinical information and for interactive retrieval of either raw or uniformly pre-processed data, along with a set of quality-control metrics. The database contains more than 10,000 previously published Affymetrix GeneChip arrays, performed using human clinical specimens. M2DB allows online querying according to a flexible combination of five clinical annotations describing disease state and sampling location. These annotations were manually curated by controlled vocabularies, based on information obtained from GEO, ArrayExpress, and published papers. For array-based assessment control, the online query provides sets of QC metrics, generated using three available QC algorithms. Arrays with poor data quality can easily be excluded from the query interface. The query provides values from two algorithms for gene-based filtering, and raw data and three kinds of pre-processed data for downloading.
M2DB utilizes a user-friendly interface for QC parameters, sample clinical annotations, and data formats to help users obtain clinical metadata. This database provides a lower entry threshold and an integrated process of meta-analysis. We hope that this research will promote further evolution of microarray meta-analysis.
在过去的十年中,基因表达微阵列研究极大地扩展了我们对人类疾病遗传机制的认识。通过整合来自多个研究的有价值信息,对大量累积数据进行荟萃分析在微阵列研究中变得越来越重要。然而,从公共微阵列存储库中收集特别感兴趣的数据往往会带来重大的实际问题。此外,包括低质量的数据可能会显著降低荟萃分析的效率。
M2DB 是一个经过人类精心策划的微阵列数据库,旨在实现易于查询、基于临床信息,并可交互式检索原始或统一预处理的数据,以及一套质量控制指标。该数据库包含超过 10000 个先前发表的 Affymetrix GeneChip 阵列,这些阵列是使用人类临床标本进行的。M2DB 允许根据五个描述疾病状态和采样位置的临床注释的灵活组合进行在线查询。这些注释是基于从 GEO、ArrayExpress 和已发表论文中获取的信息,使用受控词汇表手动策划的。为了进行基于阵列的评估控制,在线查询提供了使用三种可用 QC 算法生成的 QC 指标集。可以轻松从查询界面中排除数据质量较差的阵列。查询提供了两种算法的基因过滤值,以及用于下载的原始数据和三种预处理数据。
M2DB 利用用户友好的界面来提供 QC 参数、样本临床注释和数据格式,以帮助用户获取临床元数据。该数据库提供了较低的入门门槛和集成的荟萃分析流程。我们希望这项研究将促进微阵列荟萃分析的进一步发展。