Vignes Matthieu, Forbes Florence
BioSS at the Scottish Crop Research Institute, Invergowrie, Dundee, Scotland, UK.
IEEE/ACM Trans Comput Biol Bioinform. 2009 Apr-Jun;6(2):260-70. doi: 10.1109/TCBB.2007.70248.
Clustering of genes into groups sharing common characteristics is a useful exploratory technique for a number of subsequent computational analysis. A wide range of clustering algorithms have been proposed in particular to analyze gene expression data, but most of them consider genes as independent entities or include relevant information on gene interactions in a suboptimal way. We propose a probabilistic model that has the advantage to account for individual data (e.g., expression) and pairwise data (e.g., interaction information coming from biological networks) simultaneously. Our model is based on hidden Markov random field models in which parametric probability distributions account for the distribution of individual data. Data on pairs, possibly reflecting distance or similarity measures between genes, are then included through a graph, where the nodes represent the genes, and the edges are weighted according to the available interaction information. As a probabilistic model, this model has many interesting theoretical features. In addition, preliminary experiments on simulated and real data show promising results and points out the gain in using such an approach.
The software used in this work is written in C++ and is available with other supplementary material at http://mistis.inrialpes.fr/people/forbes/transparentia/supplementary.html.
将基因聚类成具有共同特征的组是一种用于后续多种计算分析的有用探索技术。特别是为了分析基因表达数据,已经提出了各种各样的聚类算法,但其中大多数将基因视为独立实体,或者以次优方式包含有关基因相互作用的相关信息。我们提出了一种概率模型,该模型具有同时考虑个体数据(例如,表达)和成对数据(例如,来自生物网络的相互作用信息)的优势。我们的模型基于隐马尔可夫随机场模型,其中参数概率分布说明了个体数据的分布。然后,通过一个图纳入可能反映基因之间距离或相似性度量的成对数据,其中节点代表基因,边根据可用的相互作用信息加权。作为一种概率模型,该模型具有许多有趣的理论特征。此外,对模拟数据和真实数据的初步实验显示了有前景的结果,并指出了使用这种方法的优势。
本研究中使用的软件用C++编写,可与其他补充材料一起从http://mistis.inrialpes.fr/people/forbes/transparentia/supplementary.html获得。