Hamaneh Mehdi Bagheri, Yu Yi-Kuo
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America.
PLoS One. 2014 Oct 31;9(10):e110936. doi: 10.1371/journal.pone.0110936. eCollection 2014.
Identifying similar diseases could potentially provide deeper understanding of their underlying causes, and may even hint at possible treatments. For this purpose, it is necessary to have a similarity measure that reflects the underpinning molecular interactions and biological pathways. We have thus devised a network-based measure that can partially fulfill this goal. Our method assigns weights to all proteins (and consequently their encoding genes) by using information flow from a disease to the protein interaction network and back. Similarity between two diseases is then defined as the cosine of the angle between their corresponding weight vectors. The proposed method also provides a way to suggest disease-pathway associations by using the weights assigned to the genes to perform enrichment analysis for each disease. By calculating pairwise similarities between 2534 diseases, we show that our disease similarity measure is strongly correlated with the probability of finding the diseases in the same disease family and, more importantly, sharing biological pathways. We have also compared our results to those of MimMiner, a text-mining method that assigns pairwise similarity scores to diseases. We find the results of the two methods to be complementary. It is also shown that clustering diseases based on their similarities and performing enrichment analysis for the cluster centers significantly increases the term association rate, suggesting that the cluster centers are better representatives for biological pathways than the diseases themselves. This lends support to the view that our similarity measure is a good indicator of relatedness of biological processes involved in causing the diseases. Although not needed for understanding this paper, the raw results are available for download for further study at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbpmn/DiseaseRelations/.
识别相似疾病可能有助于更深入地了解其潜在病因,甚至可能暗示潜在的治疗方法。为此,有必要有一种能反映潜在分子相互作用和生物途径的相似性度量方法。因此,我们设计了一种基于网络的度量方法,该方法可以部分实现这一目标。我们的方法通过使用从疾病到蛋白质相互作用网络再返回的信息流,为所有蛋白质(以及相应的编码基因)赋予权重。然后,两种疾病之间的相似性定义为它们相应权重向量之间夹角的余弦值。所提出的方法还提供了一种通过使用赋予基因的权重对每种疾病进行富集分析来推断疾病-途径关联的方法。通过计算2534种疾病之间的成对相似性,我们表明我们的疾病相似性度量与在同一疾病家族中发现这些疾病的概率密切相关,更重要的是,与共享生物途径的概率密切相关。我们还将我们的结果与MimMiner的结果进行了比较,MimMiner是一种为疾病分配成对相似性分数的文本挖掘方法。我们发现这两种方法的结果是互补的。研究还表明,基于疾病的相似性对疾病进行聚类并对聚类中心进行富集分析,显著提高了术语关联率,这表明聚类中心比疾病本身更能代表生物途径。这支持了这样一种观点,即我们的相似性度量是导致疾病的生物过程相关性的良好指标。虽然理解本文不需要这些原始结果,但原始结果可在ftp://ftp.ncbi.nlm.nih.gov/pub/qmbpmn/DiseaseRelations/ 下载以供进一步研究使用。