Suppr超能文献

利用受社交网络分析启发的方法预测和验证基因-疾病关联。

Prediction and validation of gene-disease associations using methods inspired by social network analyses.

机构信息

Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas, Austin, Texas, United States of America.

出版信息

PLoS One. 2013 May 1;8(5):e58977. doi: 10.1371/journal.pone.0058977. Print 2013.

Abstract

Correctly identifying associations of genes with diseases has long been a goal in biology. With the emergence of large-scale gene-phenotype association datasets in biology, we can leverage statistical and machine learning methods to help us achieve this goal. In this paper, we present two methods for predicting gene-disease associations based on functional gene associations and gene-phenotype associations in model organisms. The first method, the Katz measure, is motivated from its success in social network link prediction, and is very closely related to some of the recent methods proposed for gene-disease association inference. The second method, called Catapult (Combining dATa Across species using Positive-Unlabeled Learning Techniques), is a supervised machine learning method that uses a biased support vector machine where the features are derived from walks in a heterogeneous gene-trait network. We study the performance of the proposed methods and related state-of-the-art methods using two different evaluation strategies, on two distinct data sets, namely OMIM phenotypes and drug-target interactions. Finally, by measuring the performance of the methods using two different evaluation strategies, we show that even though both methods perform very well, the Katz measure is better at identifying associations between traits and poorly studied genes, whereas Catapult is better suited to correctly identifying gene-trait associations overall [corrected].

摘要

正确识别基因与疾病之间的关联一直是生物学中的一个目标。随着生物学中大规模基因-表型关联数据集的出现,我们可以利用统计和机器学习方法来帮助我们实现这一目标。在本文中,我们提出了两种基于功能基因关联和模式生物中基因-表型关联预测基因-疾病关联的方法。第一种方法,Katz 度量法,是从其在社交网络链接预测中的成功中得到启发的,并且与最近提出的一些用于基因-疾病关联推断的方法非常相似。第二种方法,称为 Catapult(使用正无标签学习技术跨物种组合数据),是一种有监督的机器学习方法,它使用有偏差的支持向量机,其中特征来自异质基因-性状网络中的游走。我们使用两种不同的评估策略,在两个不同的数据集上,即 OMIM 表型和药物-靶标相互作用,研究了所提出的方法和相关的最先进方法的性能。最后,通过使用两种不同的评估策略来衡量方法的性能,我们表明,尽管这两种方法的性能都非常好,但 Katz 度量法更擅长识别性状与研究较少的基因之间的关联,而 Catapult 则更适合正确识别基因-性状关联。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9541/3641094/ef244a59298b/pone.0058977.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验