Akulenko Ruslan, Merl Markus, Helms Volkhard
Center for Bioinformatics, Saarland University, Saarbruecken, Germany.
Graduate School for Computer Science, Saarland University, Saarbruecken, Germany.
PLoS One. 2016 Aug 25;11(8):e0159921. doi: 10.1371/journal.pone.0159921. eCollection 2016.
Batch effects describe non-natural variations of, for example, large-scale genomic data sets. If not corrected by suitable numerical algorithms, batch effects may seriously affect the analysis of these datasets. The novel array platform independent software tool BEclear enables researchers to identify those portions of the data that deviate statistically significant from the remaining data and to replace these portions by typical values reconstructed from neighboring data entries based on latent factor models. In contrast to other comparable methods that often use some sort of global normalization of the data, BEclear avoids changing the apparently unaffected parts of the data. We tested the performance of this approach on DNA methylation data for various tumor data sets taken from The Cancer Genome Atlas and compared the results to those obtained with the existing algorithms ComBat, Surrogate Variable Analysis, RUVm and Functional normalization. BEclear constantly performed at par with or better than these methods. BEclear is available as an R package at the Bioconductor project http://bioconductor.org/packages/release/bioc/html/BEclear.html.
批次效应描述了例如大规模基因组数据集的非自然变异。如果不通过合适的数值算法进行校正,批次效应可能会严重影响这些数据集的分析。新型的独立于阵列平台的软件工具BEclear使研究人员能够识别数据中那些在统计上与其余数据有显著差异的部分,并根据潜在因子模型用从相邻数据条目中重建的典型值替换这些部分。与其他通常对数据进行某种全局归一化的可比方法不同,BEclear避免改变数据中明显未受影响的部分。我们在取自癌症基因组图谱的各种肿瘤数据集的DNA甲基化数据上测试了这种方法的性能,并将结果与使用现有算法ComBat、替代变量分析、RUVm和功能归一化所获得的结果进行了比较。BEclear的表现始终与这些方法相当或更好。BEclear作为一个R包可在生物导体项目http://bioconductor.org/packages/release/bioc/html/BEclear.html上获取。