Xiao Guanghua, Pan Wei
Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo Building (MMC 303), Minneapolis, MN 55455-0378, USA.
J Bioinform Comput Biol. 2005 Dec;3(6):1371-89. doi: 10.1142/s0219720005001612.
Prediction of biological functions of genes is an important issue in basic biology research and has applications in drug discoveries and gene therapies. Previous studies have shown either gene expression data or protein-protein interaction data alone can be used for predicting gene functions. In particular, clustering gene expression profiles has been widely used for gene function prediction. In this paper, we first propose a new method for gene function prediction using protein-protein interaction data, which will facilitate combining prediction results based on clustering gene expression profiles. We then propose a new method to combine the prediction results based on either source of data by weighting on the evidence provided by each. Using protein-protein interaction data downloaded from the GRID database, published gene expression profiles from 300 microarray experiments for the yeast S. cerevisiae, we show that this new combined analysis provides improved predictive performance over that of using either data source alone in a cross-validated analysis of the MIPS gene annotations. Finally, we propose a logistic regression method that is flexible enough to combine information from any number of data sources while maintaining computational feasibility.
基因生物学功能的预测是基础生物学研究中的一个重要问题,在药物发现和基因治疗中具有应用价值。先前的研究表明,单独的基因表达数据或蛋白质-蛋白质相互作用数据都可用于预测基因功能。特别是,对基因表达谱进行聚类已被广泛用于基因功能预测。在本文中,我们首先提出一种利用蛋白质-蛋白质相互作用数据进行基因功能预测的新方法,这将有助于基于聚类基因表达谱来合并预测结果。然后,我们提出一种新方法,通过对每种数据来源提供的证据进行加权,来合并基于任一数据来源的预测结果。利用从GRID数据库下载的蛋白质-蛋白质相互作用数据以及已发表的来自300个酿酒酵母微阵列实验的基因表达谱,我们表明,在对MIPS基因注释进行交叉验证分析时,这种新的联合分析比单独使用任何一种数据来源具有更高的预测性能。最后,我们提出一种逻辑回归方法,该方法足够灵活,能够在保持计算可行性的同时合并来自任意数量数据来源的信息。