Hausen Jonas, Otte Jens C, Strähle Uwe, Hammers-Wirtz Monika, Hollert Henner, Keiter Steffen H, Ottermanns Richard
Institute for Environmental Research, RWTH Aachen University, Worringerweg 1, 52074, Aachen, Germany.
Institute of Toxicology and Genetics, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany.
Environ Sci Pollut Res Int. 2015 Nov;22(21):16384-92. doi: 10.1007/s11356-015-5019-0. Epub 2015 Jul 17.
Transcriptomics is often used to investigate changes in an organism's genetic response to environmental contamination. Data noise can mask the effects of contaminants making it difficult to detect responding genes. Because the number of genes which are found differentially expressed in transcriptome data is often very large, algorithms are needed to reduce the number down to a few robust discriminative genes. We present an algorithm for aggregated analysis of transcriptome data which uses multiple fold-change thresholds (threshold screening) and p values from Bayesian generalized linear model in order to assess the robustness of a gene as a potential indicator for the treatments tested. The algorithm provides a robustness indicator (ROBI) as well as a significance profile, which can be used to assess the statistical significance of a given gene for different fold-change thresholds. Using ROBI, eight discriminative genes were identified from an exemplary dataset (Danio rerio FET treated with chlorpyrifos, methylmercury, and PCB) which could be potential indicators for a given substance. Significance profiles uncovered genetic effects and revealed appropriate fold-change thresholds for single genes or gene clusters. Fold-change threshold screening is a powerful tool for dimensionality reduction and feature selection in transcriptome data, as it effectively reduces the number of detected genes suitable for environmental monitoring. In addition, it is able to unmask patterns in altered genetic expression hidden by data noise and reduces the chance of type II errors, e.g., in environmental screening.
转录组学常用于研究生物体对环境污染的基因反应变化。数据噪声会掩盖污染物的影响,使得难以检测出有反应的基因。由于在转录组数据中发现差异表达的基因数量通常非常多,因此需要算法将其数量减少到几个可靠的判别基因。我们提出了一种用于转录组数据汇总分析的算法,该算法使用多个倍数变化阈值(阈值筛选)和贝叶斯广义线性模型的p值,以评估一个基因作为所测试处理的潜在指标的稳健性。该算法提供了一个稳健性指标(ROBI)以及一个显著性概况,可用于评估给定基因在不同倍数变化阈值下的统计显著性。使用ROBI,从一个示例数据集(用毒死蜱、甲基汞和多氯联苯处理的斑马鱼FET)中鉴定出八个判别基因,它们可能是给定物质的潜在指标。显著性概况揭示了基因效应,并为单个基因或基因簇揭示了合适的倍数变化阈值。倍数变化阈值筛选是转录组数据降维和特征选择的有力工具,因为它有效地减少了适合环境监测的检测基因数量。此外,它能够揭示被数据噪声隐藏的基因表达变化模式,并减少II型错误的发生几率,例如在环境筛选中。