Misra Sanchit, Pamnany Kiran, Aluru Srinivas
IEEE/ACM Trans Comput Biol Bioinform. 2015 Sep-Oct;12(5):1008-20. doi: 10.1109/TCBB.2015.2415931.
Construction of whole-genome networks from large-scale gene expression data is an important problem in systems biology. While several techniques have been developed, most cannot handle network reconstruction at the whole-genome scale, and the few that can, require large clusters. In this paper, we present a solution on the Intel Xeon Phi coprocessor, taking advantage of its multi-level parallelism including many x86-based cores, multiple threads per core, and vector processing units. We also present a solution on the Intel® Xeon® processor. Our solution is based on TINGe, a fast parallel network reconstruction technique that uses mutual information and permutation testing for assessing statistical significance. We demonstrate the first ever inference of a plant whole genome regulatory network on a single chip by constructing a 15,575 gene network of the plant Arabidopsis thaliana from 3,137 microarray experiments in only 22 minutes. In addition, our optimization for parallelizing mutual information computation on the Intel Xeon Phi coprocessor holds out lessons that are applicable to other domains.
从大规模基因表达数据构建全基因组网络是系统生物学中的一个重要问题。虽然已经开发了多种技术,但大多数技术无法处理全基因组规模的网络重建,而少数能够处理的技术则需要大型集群。在本文中,我们利用英特尔至强融核协处理器的多级并行性(包括许多基于x86的内核、每个内核的多个线程以及向量处理单元)提出了一种解决方案。我们还在英特尔®至强®处理器上提出了一种解决方案。我们的解决方案基于TINGe,这是一种快速并行网络重建技术,它使用互信息和排列检验来评估统计显著性。我们仅用22分钟就从3137个微阵列实验构建了拟南芥的15575个基因的网络,首次在单芯片上推断出植物全基因组调控网络。此外,我们在英特尔至强融核协处理器上对互信息计算进行并行化的优化也为其他领域提供了可借鉴的经验。