Ansari Sahar, Voichita Calin, Donato Michele, Tagett Rebecca, Draghici Sorin
Department of Computer Science, Wayne State University, Detroit, MI, USA.
Proc IEEE Inst Electr Electron Eng. 2017 Mar;105(3):482-495. doi: 10.1109/JPROC.2016.2531000. Epub 2016 Mar 24.
A crucial step in the understanding of any phenotype is the correct identification of the signaling pathways that are significantly impacted in that phenotype. However, most current pathway analysis methods produce both false positives as well as false negatives in certain circumstances. We hypothesized that such incorrect results are due to the fact that the existing methods fail to distinguish between the primary dis-regulation of a given gene itself and the effects of signaling coming from upstream. Furthermore, a modern whole-genome experiment performed with a next-generation technology spends a great deal of effort to measure the entire set of 30,000-100,000 transcripts in the genome. This is followed by the selection of a few hundreds differentially expressed genes, step that literally discards more than 99% of the collected data. We also hypothesized that such a drastic filtering could discard many genes that play crucial roles in the phenotype. We propose a novel topology-based pathway analysis method that identifies significantly impacted pathways using the entire set of measurements, thus allowing the full use of the data provided by NGS techniques. The results obtained on 24 real data sets involving 12 different human diseases, as well as on 8 yeast knock-out data sets show that the proposed method yields significant improvements with respect to the state-of-the-art methods: SPIA, GSEA and GSA.
Primary dis-regulation analysis is implemented in R and included in ROntoTools Bioconductor package (versions ≥ 2.0.0). https://www.bioconductor.org/packages/release/bioc/html/ROntoTools.html.
理解任何表型的关键步骤是正确识别在该表型中受到显著影响的信号通路。然而,当前大多数通路分析方法在某些情况下会产生假阳性和假阴性结果。我们假设,这些错误结果是由于现有方法未能区分给定基因本身的主要失调与来自上游信号的影响。此外,使用下一代技术进行的现代全基因组实验花费大量精力来测量基因组中30000 - 100000个转录本的整个集合。随后会选择几百个差异表达基因,这一步实际上丢弃了超过99%收集到的数据。我们还假设,这种剧烈的筛选可能会丢弃许多在表型中起关键作用的基因。我们提出了一种基于拓扑结构的新型通路分析方法,该方法使用整个测量集合来识别受到显著影响的通路,从而充分利用NGS技术提供的数据。在涉及12种不同人类疾病的24个真实数据集以及8个酵母基因敲除数据集上获得的结果表明,所提出的方法相对于现有最先进的方法(SPIA、GSEA和GSA)有显著改进。
初级失调分析在R语言中实现,并包含在ROntoTools Bioconductor包(版本≥2.0.0)中。https://www.bioconductor.org/packages/release/bioc/html/ROntoTools.html。