Stokes Todd H, Moffitt Richard A, Phan John H, Wang May D
Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0535, USA.
Ann Biomed Eng. 2007 Jun;35(6):1068-80. doi: 10.1007/s10439-007-9313-y. Epub 2007 Apr 26.
Quality assurance of high throughput "-omics" data is a major concern for biomedical discovery and translational medicine, and is considered a top priority in bioinformatics and systems biology. Here, we report a web-based bioinformatics tool called caCORRECT for chip artifact detection, analysis, and CORRECTion, which removes systematic artifactual noises that are commonly observed in microarray gene expression data. Despite the development of major databases such as GEO arrayExpress, caArray, and the SMD to manage and distribute microarray data to the public, reproducibility has been questioned in many cases, including high-profile papers and datasets. Based on both archived and synthetic data, we have designed the caCORRECT to have several advanced features: (1) to uncover significant, correctable artifacts that affect reproducibility of experiments; (2) to improve the integrity and quality of public archives by removing artifacts; (3) to provide a universal quality score to aid users in their selection of suitable microarray data; and (4) to improve the true-positive rate of biomarker selection verified by test data. These features are expected to improve the reproducibility of Microarray study. caCORRECT is freely available at: http://caCORRECT.bme.gatech.edu.
高通量“组学”数据的质量保证是生物医学发现和转化医学的主要关注点,并且被视为生物信息学和系统生物学的首要任务。在此,我们报告一种名为caCORRECT的基于网络的生物信息学工具,用于芯片伪影检测、分析和校正,该工具可去除在微阵列基因表达数据中常见的系统性伪影噪声。尽管诸如GEO arrayExpress、caArray和SMD等主要数据库已得到开发,用于管理和向公众分发微阵列数据,但在许多情况下,包括一些备受瞩目的论文和数据集,其可重复性受到了质疑。基于存档数据和合成数据,我们设计的caCORRECT具有若干先进特性:(1)揭示影响实验可重复性的显著且可校正的伪影;(2)通过去除伪影来提高公共存档的完整性和质量;(3)提供通用质量评分以帮助用户选择合适的微阵列数据;(4)提高经测试数据验证的生物标志物选择的真阳性率。这些特性有望提高微阵列研究的可重复性。caCORRECT可在以下网址免费获取:http://caCORRECT.bme.gatech.edu 。