Center for Computational Biology & Bioinformatics, Department of Medicine, University of California San Diego, La Jolla, CA, USA.
Department of Medicine, University of California San Diego, La Jolla, CA, USA.
Nat Protoc. 2023 Jun;18(6):1745-1759. doi: 10.1038/s41596-022-00797-1. Epub 2023 Jan 18.
A longstanding goal of biomedicine is to understand how alterations in molecular and cellular networks give rise to the spectrum of human diseases. For diseases with shared etiology, understanding the common causes allows for improved diagnosis of each disease, development of new therapies and more comprehensive identification of disease genes. Accordingly, this protocol describes how to evaluate the extent to which two diseases, each characterized by a set of mapped genes, are colocalized in a reference gene interaction network. This procedure uses network propagation to measure the network 'distance' between gene sets. For colocalized diseases, the network can be further analyzed to extract common gene communities at progressive granularities. In particular, we show how to: (1) obtain input gene sets and a reference gene interaction network; (2) identify common subnetworks of genes that encompass or are in close proximity to all gene sets; (3) use multiscale community detection to identify systems and pathways represented by each common subnetwork to generate a network colocalized systems map; (4) validate identified genes and systems using a mouse variant database; and (5) visualize and further investigate select genes, interactions and systems for relevance to phenotype(s) of interest. We demonstrate the utility of this approach by identifying shared biological mechanisms underlying autism and congenital heart disease. However, this protocol is general and can be applied to any gene sets attributed to diseases or other phenotypes with suspected joint association. A typical NetColoc run takes less than an hour. Software and documentation are available at https://github.com/ucsd-ccbb/NetColoc .
长期以来,生物医学的目标一直是了解分子和细胞网络的改变如何导致人类疾病谱的出现。对于具有共同病因的疾病,了解共同的病因可以改善每种疾病的诊断,开发新的治疗方法,并更全面地识别疾病基因。因此,本方案描述了如何评估两种疾病(每种疾病都有一组映射基因)在参考基因相互作用网络中的重叠程度。该程序使用网络传播来测量基因集之间的网络“距离”。对于重叠疾病,可以进一步分析网络以提取具有渐进粒度的常见基因社区。特别是,我们展示了如何:(1)获取输入基因集和参考基因相互作用网络;(2)识别包含或接近所有基因集的共同基因子网络;(3)使用多尺度社区检测来识别每个共同子网络所代表的系统和途径,以生成网络共定位系统图谱;(4)使用鼠标变体数据库验证鉴定的基因和系统;(5)可视化并进一步研究选择的基因、相互作用和系统,以确定与感兴趣的表型(s)的相关性。我们通过确定自闭症和先天性心脏病的共同生物学机制来证明这种方法的实用性。然而,该方案是通用的,可以应用于任何归因于疾病或其他疑似共同关联的表型的基因集。典型的 NetColoc 运行时间不到一个小时。软件和文档可在 https://github.com/ucsd-ccbb/NetColoc 上获得。