Niu Zhen, Chasman Deborah, Eisfeld Amie J, Kawaoka Yoshihiro, Roy Sushmita
Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA.
Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA.
Bioinformatics. 2016 May 15;32(10):1509-17. doi: 10.1093/bioinformatics/btw007. Epub 2016 Jan 21.
Identifying the shared and pathogen-specific components of host transcriptional regulatory programs is important for understanding the principles of regulation of immune response. Recent efforts in systems biology studies of infectious diseases have resulted in a large collection of datasets measuring host transcriptional response to various pathogens. Computational methods to identify and compare gene expression modules across different infections offer a powerful way to identify strain-specific and shared components of the regulatory program. An important challenge is to identify statistically robust gene expression modules as well as to reliably detect genes that change their module memberships between infections.
We present MULCCH (MULti-task spectral Consensus Clustering for Hierarchically related tasks), a consensus extension of a multi-task clustering algorithm to infer high-confidence strain-specific host response modules under infections from multiple virus strains. On simulated data, MULCCH more accurately identifies genes exhibiting pathogen-specific patterns compared to non-consensus and nonmulti-task clustering approaches. Application of MULCCH to mammalian transcriptional response to a panel of influenza viruses showed that our method identifies clusters with greater coherence compared to non-consensus methods. Further, MULCCH derived clusters are enriched for several immune system-related processes and regulators. In summary, MULCCH provides a reliable module-based approach to identify molecular pathways and gene sets characterizing commonality and specificity of host response to viruses of different pathogenicities.
The source code is available at https://bitbucket.org/roygroup/mulcch
Supplementary data are available at Bioinformatics online.
识别宿主转录调控程序中的共享成分和病原体特异性成分对于理解免疫反应的调控原理至关重要。传染病系统生物学研究的最新成果产生了大量测量宿主对各种病原体转录反应的数据集。跨不同感染识别和比较基因表达模块的计算方法为识别调控程序的菌株特异性和共享成分提供了一种强大的方式。一个重要的挑战是识别具有统计学稳健性的基因表达模块,以及可靠地检测在不同感染之间改变其模块成员身份的基因。
我们提出了MULCCH(用于层次相关任务的多任务谱共识聚类),这是一种多任务聚类算法的共识扩展,用于从多种病毒株感染下推断高置信度的菌株特异性宿主反应模块。在模拟数据上,与非共识和非多任务聚类方法相比,MULCCH能更准确地识别呈现病原体特异性模式的基因。将MULCCH应用于哺乳动物对一组流感病毒的转录反应表明,与非共识方法相比,我们的方法识别出的聚类具有更高的一致性。此外,MULCCH衍生的聚类富含多种免疫系统相关过程和调节因子。总之,MULCCH提供了一种基于模块的可靠方法,用于识别表征宿主对不同致病性病毒反应的共性和特异性的分子途径和基因集。
源代码可在https://bitbucket.org/roygroup/mulcch获取
补充数据可在《生物信息学》在线获取。