Wang Ming-Chih, Chen Feng-Chi, Chen Yen-Zho, Huang Yao-Ting, Chuang Trees-Juen
Genomics Research Center, Academia Sinica, Taipei, 11529, Taiwan.
BMC Res Notes. 2012 May 2;5:212. doi: 10.1186/1756-0500-5-212.
Complex human diseases may be associated with many gene interactions. Gene interactions take several different forms and it is difficult to identify all of the interactions that are potentially associated with human diseases. One approach that may fill this knowledge gap is to infer previously unknown gene interactions via identification of non-physical linkages between different mutations (or single nucleotide polymorphisms, SNPs) to avoid hitchhiking effect or lack of recombination. Strong non-physical SNP linkages are considered to be an indication of biological (gene) interactions. These interactions can be physical protein interactions, regulatory interactions, functional compensation/antagonization or many other forms of interactions. Previous studies have shown that mutations in different genes can be linked to the same disorders. Therefore, non-physical SNP linkages, coupled with knowledge of SNP-disease associations may shed more light on the role of gene interactions in human disorders. A user-friendly web resource that integrates information about non-physical SNP linkages, gene annotations, SNP information, and SNP-disease associations may thus be a good reference for biomedical research.
Here we extracted the SNPs located within the promoter or exonic regions of protein-coding genes from the HapMap database to construct a database named the Linkage-Disequilibrium-based Gene Interaction database (LDGIdb). The database stores 646,203 potential human gene interactions, which are potential interactions inferred from SNP pairs that are subject to long-range strong linkage disequilibrium (LD), or non-physical linkages. To minimize the possibility of hitchhiking, SNP pairs inferred to be non-physically linked were required to be located in different chromosomes or in different LD blocks of the same chromosomes. According to the genomic locations of the involved SNPs (i.e., promoter, untranslated region (UTR) and coding region (CDS)), the SNP linkages inferred were categorized into promoter-promoter, promoter-UTR, promoter-CDS, CDS-CDS, CDS-UTR and UTR-UTR linkages. For the CDS-related linkages, the coding SNPs were further classified into nonsynonymous and synonymous variations, which represent potential gene interactions at the protein and RNA level, respectively. The LDGIdb also incorporates human disease-association databases such as Genome-Wide Association Studies (GWAS) and Online Mendelian Inheritance in Man (OMIM), so that the user can search for potential disease-associated SNP linkages. The inferred SNP linkages are also classified in the context of population stratification to provide a resource for investigating potential population-specific gene interactions.
The LDGIdb is a user-friendly resource that integrates non-physical SNP linkages and SNP-disease associations for studies of gene interactions in human diseases. With the help of the LDGIdb, it is plausible to infer population-specific SNP linkages for more focused studies, an avenue that is potentially important for pharmacogenetics. Moreover, by referring to disease-association information such as the GWAS data, the LDGIdb may help identify previously uncharacterized disease-associated gene interactions and potentially lead to new discoveries in studies of human diseases.
复杂人类疾病可能与许多基因相互作用相关。基因相互作用有几种不同形式,识别所有可能与人类疾病相关的相互作用很困难。一种可能填补这一知识空白的方法是通过识别不同突变(或单核苷酸多态性,SNP)之间的非物理连锁来推断先前未知的基因相互作用,以避免搭便车效应或缺乏重组。强烈的非物理SNP连锁被认为是生物(基因)相互作用的一个指标。这些相互作用可以是物理蛋白质相互作用、调控相互作用、功能补偿/拮抗或许多其他形式的相互作用。先前的研究表明,不同基因中的突变可能与相同疾病相关。因此,非物理SNP连锁,再加上SNP与疾病关联的知识,可能会更清楚地揭示基因相互作用在人类疾病中的作用。一个整合了非物理SNP连锁、基因注释、SNP信息和SNP与疾病关联信息的用户友好型网络资源,可能因此成为生物医学研究的一个很好的参考。
在这里,我们从HapMap数据库中提取了位于蛋白质编码基因启动子或外显子区域内的SNP,以构建一个名为基于连锁不平衡的基因相互作用数据库(LDGIdb)。该数据库存储了646,203种潜在的人类基因相互作用,这些相互作用是从经历长程强连锁不平衡(LD)或非物理连锁的SNP对推断出的潜在相互作用。为了尽量减少搭便车的可能性,推断为非物理连锁的SNP对需要位于不同染色体或同一染色体的不同LD块中。根据所涉及SNP的基因组位置(即启动子、非翻译区(UTR)和编码区(CDS)),推断出的SNP连锁被分类为启动子-启动子、启动子-UTR、启动子-CDS、CDS-CDS、CDS-UTR和UTR-UTR连锁。对于与CDS相关的连锁,编码SNP进一步分为非同义变异和同义变异,分别代表蛋白质和RNA水平上的潜在基因相互作用。LDGIdb还纳入了人类疾病关联数据库,如全基因组关联研究(GWAS)和《人类孟德尔遗传在线》(OMIM),以便用户可以搜索潜在的疾病相关SNP连锁。推断出的SNP连锁也在群体分层的背景下进行分类,以提供一个研究潜在群体特异性基因相互作用的资源。
LDGIdb是一个用户友好型资源,整合了非物理SNP连锁和SNP与疾病的关联,用于研究人类疾病中的基因相互作用。借助LDGIdb,可以推断出群体特异性的SNP连锁,以便进行更有针对性的研究,这一途径对药物遗传学可能很重要。此外,通过参考GWAS数据等疾病关联信息,LDGIdb可能有助于识别先前未表征的疾病相关基因相互作用,并可能在人类疾病研究中带来新的发现。