Suppr超能文献

基于核的数据融合用于基因优先级排序。

Kernel-based data fusion for gene prioritization.

作者信息

De Bie Tijl, Tranchevent Léon-Charles, van Oeffelen Liesbeth M M, Moreau Yves

机构信息

Department of Engineering Mathematics, University of Bristol, University Walk, BS8 1TR, Bristol, UK.

出版信息

Bioinformatics. 2007 Jul 1;23(13):i125-32. doi: 10.1093/bioinformatics/btm187.

Abstract

MOTIVATION

Hunting disease genes is a problem of primary importance in biomedical research. Biologists usually approach this problem in two steps: first a set of candidate genes is identified using traditional positional cloning or high-throughput genomics techniques; second, these genes are further investigated and validated in the wet lab, one by one. To speed up discovery and limit the number of costly wet lab experiments, biologists must test the candidate genes starting with the most probable candidates. So far, biologists have relied on literature studies, extensive queries to multiple databases and hunches about expected properties of the disease gene to determine such an ordering. Recently, we have introduced the data mining tool ENDEAVOUR (Aerts et al., 2006), which performs this task automatically by relying on different genome-wide data sources, such as Gene Ontology, literature, microarray, sequence and more.

RESULTS

In this article, we present a novel kernel method that operates in the same setting: based on a number of different views on a set of training genes, a prioritization of test genes is obtained. We furthermore provide a thorough learning theoretical analysis of the method's guaranteed performance. Finally, we apply the method to the disease data sets on which ENDEAVOUR (Aerts et al., 2006) has been benchmarked, and report a considerable improvement in empirical performance.

AVAILABILITY

The MATLAB code used in the empirical results will be made publicly available.

摘要

动机

寻找疾病基因是生物医学研究中至关重要的问题。生物学家通常分两步解决这个问题:首先,使用传统的定位克隆或高通量基因组学技术识别一组候选基因;其次,在湿实验室中对这些基因逐一进行进一步研究和验证。为了加快发现速度并限制昂贵的湿实验室实验数量,生物学家必须从最有可能的候选基因开始测试候选基因。到目前为止,生物学家一直依靠文献研究、对多个数据库的广泛查询以及对疾病基因预期特性的直觉来确定这种排序。最近,我们引入了数据挖掘工具ENDEAVOUR(Aerts等人,2006年),它通过依赖不同的全基因组数据源(如基因本体论、文献、微阵列、序列等)自动执行此任务。

结果

在本文中,我们提出了一种在相同环境下运行的新颖核方法:基于对一组训练基因的多种不同视图,获得测试基因的优先级排序。我们还对该方法的保证性能进行了全面的学习理论分析。最后,我们将该方法应用于已对ENDEAVOUR(Aerts等人,2006年)进行基准测试的疾病数据集,并报告了实证性能的显著提高。

可用性

实证结果中使用的MATLAB代码将公开提供。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验