Nicotra Luca, Micheli Alessio
Dipartimento di Informatica, Università di Pisa, Largo B. Pontecorvo 3, 56127 Pisa, Italy.
Artif Intell Med. 2009 Feb-Mar;45(2-3):125-34. doi: 10.1016/j.artmed.2008.08.007. Epub 2008 Sep 26.
Modeling phylogenetic interactions is an open issue in many computational biology problems. In the context of gene function prediction we introduce a class of kernels for structured data leveraging on a hierarchical probabilistic modeling of phylogeny among species.
We derive three kernels belonging to this setting: a sufficient statistics kernel, a Fisher kernel, and a probability product kernel. The new kernels are used in the context of support vector machine learning. The kernels adaptivity is obtained through the estimation of the parameters of a tree structured model of evolution using as observed data phylogenetic profiles encoding the presence or absence of specific genes in a set of fully sequenced genomes.
We report results obtained in the prediction of the functional class of the proteins of the budding yeast Saccharomyces cerevisae which favorably compare to a standard vector based kernel and to a non-adaptive tree kernel function. A further comparative analysis is performed in order to assess the impact of the different components of the proposed approach.
We show that the key features of the proposed kernels are the adaptivity to the input domain and the ability to deal with structured data interpreted through a graphical model representation.
在许多计算生物学问题中,对系统发育相互作用进行建模是一个尚未解决的问题。在基因功能预测的背景下,我们基于物种间系统发育的分层概率模型,引入了一类用于结构化数据的核函数。
我们推导了属于该设置的三个核函数:一个充分统计核函数、一个费舍尔核函数和一个概率乘积核函数。这些新的核函数用于支持向量机学习的背景下。通过使用编码一组全序列基因组中特定基因存在与否的系统发育谱作为观测数据,估计树状结构进化模型的参数,获得核函数的适应性。
我们报告了在预测酿酒酵母蛋白质功能类时获得的结果,这些结果与基于标准向量的核函数和非自适应树核函数相比具有优势。为了评估所提出方法的不同组成部分的影响,进行了进一步的比较分析。
我们表明,所提出的核函数的关键特征是对输入域的适应性以及处理通过图形模型表示解释的结构化数据的能力。