Kustra Rafal, Shioda Romy, Zhu Mu
Public Health Sciences, University of Toronto, Toronto, ON, Canada.
BMC Bioinformatics. 2006 Apr 21;7:216. doi: 10.1186/1471-2105-7-216.
Expression array data are used to predict biological functions of uncharacterized genes by comparing their expression profiles to those of characterized genes. While biologically plausible, this is both statistically and computationally challenging. Typical approaches are computationally expensive and ignore correlations among expression profiles and functional categories.
We propose a factor analysis model (FAM) for functional genomics and give a two-step algorithm, using genome-wide expression data for yeast and a subset of Gene-Ontology Biological Process functional annotations. We show that the predictive performance of our method is comparable to the current best approach while our total computation time was faster by a factor of 4000. We discuss the unique challenges in performance evaluation of algorithms used for genome-wide functions genomics. Finally, we discuss extensions to our method that can incorporate the inherent correlation structure of the functional categories to further improve predictive performance.
Our factor analysis model is a computationally efficient technique for functional genomics and provides a clear and unified statistical framework with potential for incorporating important gene ontology information to improve predictions.
表达阵列数据通过将未表征基因的表达谱与已表征基因的表达谱进行比较,来预测其生物学功能。尽管从生物学角度看似合理,但这在统计和计算方面都具有挑战性。典型方法计算成本高昂,且忽略了表达谱与功能类别之间的相关性。
我们提出了一种用于功能基因组学的因子分析模型(FAM),并给出了一种两步算法,使用酵母的全基因组表达数据和基因本体生物学过程功能注释的一个子集。我们表明,我们方法的预测性能与当前最佳方法相当,而我们的总计算时间快了4000倍。我们讨论了用于全基因组功能基因组学的算法在性能评估中面临的独特挑战。最后,我们讨论了对我们方法的扩展,该扩展可以纳入功能类别的固有相关结构,以进一步提高预测性能。
我们的因子分析模型是一种用于功能基因组学的计算高效技术,提供了一个清晰统一的统计框架,具有纳入重要基因本体信息以改进预测的潜力。