Fang Hai, Gough Julian
Department of Computer Science, University of Bristol, The Merchant Venturers Building, Bristol BS8 1UB, UK.
Mol Biosyst. 2013 Jul;9(7):1686-96. doi: 10.1039/c3mb25495j. Epub 2013 Mar 5.
Protein domains are classified as units of structure, evolution and function, and thus form the molecular backbone of biosphere. Although functional networks at the protein level have been reported to be of value in predicting diseases (phenotypes or drugs), they have not previously been applied at the sub-protein resolution (protein domain in this case). We herein introduce a domain network with a functional perspective. This network has nodes consisting of protein domains (at the superfamily/evolutionary level), with edges weighted by the semantic similarity according to domain-centric Gene Ontology (dcGO) annotations, which henceforth we call "dcGOnet". By globally exploring this network via a random walk, we demonstrate its predictive value on disease, drug, or phenotype-related ontologies. On cross-validation recovering ontology labels for domains, we achieve an overall area under the ROC curve of 89.0% for drugs, 87.3% for diseases, 87.6% for human phenotypes and 88.2% for mouse phenotypes. We show that the performance using global information from this network is significantly better than using local information, and also illustrate that the better performance is not sensitive to network size, or the choice of algorithm parameters, and is universal to different ontologies. Based on the dcGOnet and its global properties, we further develop an approach to build a disease-drug-phenotype matrix. The predicted interconnections are statistically supported using a novel randomization procedure, and are also empirically supported by inspection for biological relevance. Most of the high-ranking predictions recover connections that are well known, but others uncover connections that have only suggestive or obscure support in the literature; we show that these are missed by simpler methods, in particular for drug-disease connections. The value of this work is threefold: we describe a general methodology and make the software available, we provide the functional domain network itself, and the ranked drug-disease-phenotype matrix provides rich targets for investigation. All three can be found at .
蛋白质结构域被归类为结构、进化和功能的单位,因此构成了生物圈的分子主干。尽管据报道蛋白质水平的功能网络在预测疾病(表型或药物)方面具有价值,但此前尚未在亚蛋白质分辨率(在这种情况下为蛋白质结构域)上应用。我们在此引入一个具有功能视角的结构域网络。该网络的节点由蛋白质结构域(在超家族/进化水平)组成,边根据以结构域为中心的基因本体(dcGO)注释按语义相似性加权,我们将其称为“dcGOnet”。通过随机游走全局探索这个网络,我们证明了它在与疾病、药物或表型相关的本体上的预测价值。在交叉验证恢复结构域的本体标签时,我们在药物方面的ROC曲线下面积总体达到89.0%,疾病方面为87.3%,人类表型方面为87.6%,小鼠表型方面为88.2%。我们表明,使用该网络的全局信息的性能明显优于使用局部信息,并且还说明更好的性能对网络大小或算法参数的选择不敏感,并且对不同的本体具有通用性。基于dcGOnet及其全局特性,我们进一步开发了一种构建疾病 - 药物 - 表型矩阵的方法。预测的相互连接通过一种新颖的随机化程序得到统计支持,并且通过检查生物学相关性也得到经验支持。大多数高排名预测恢复了众所周知的连接,但其他预测揭示了在文献中只有暗示性或模糊支持的连接;我们表明这些连接被更简单的方法遗漏了,特别是对于药物 - 疾病连接。这项工作的价值有三个方面:我们描述了一种通用方法并提供了软件,我们提供了功能结构域网络本身,并且排名的药物 - 疾病 - 表型矩阵提供了丰富的研究目标。所有这三个都可以在……找到。