基因表达的软整合预测和 DREAM3 基因表达挑战的弹性网络-最佳性能。
Gene expression prediction by soft integration and the elastic net-best performance of the DREAM3 gene expression challenge.
机构信息
Department of Science and Technology, Linköping University, Norrköping, Sweden.
出版信息
PLoS One. 2010 Feb 16;5(2):e9134. doi: 10.1371/journal.pone.0009134.
BACKGROUND
To predict gene expressions is an important endeavour within computational systems biology. It can both be a way to explore how drugs affect the system, as well as providing a framework for finding which genes are interrelated in a certain process. A practical problem, however, is how to assess and discriminate among the various algorithms which have been developed for this purpose. Therefore, the DREAM project invited the year 2008 to a challenge for predicting gene expression values, and here we present the algorithm with best performance.
METHODOLOGY/PRINCIPAL FINDINGS: We develop an algorithm by exploring various regression schemes with different model selection procedures. It turns out that the most effective scheme is based on least squares, with a penalty term of a recently developed form called the "elastic net". Key components in the algorithm are the integration of expression data from other experimental conditions than those presented for the challenge and the utilization of transcription factor binding data for guiding the inference process towards known interactions. Of importance is also a cross-validation procedure where each form of external data is used only to the extent it increases the expected performance.
CONCLUSIONS/SIGNIFICANCE: Our algorithm proves both the possibility to extract information from large-scale expression data concerning prediction of gene levels, as well as the benefits of integrating different data sources for improving the inference. We believe the former is an important message to those still hesitating on the possibilities for computational approaches, while the latter is part of an important way forward for the future development of the field of computational systems biology.
背景
在计算系统生物学中,预测基因表达是一项重要的工作。它不仅可以用来探索药物如何影响系统,还可以为寻找特定过程中相互关联的基因提供框架。然而,一个实际的问题是如何评估和区分为此目的开发的各种算法。因此,DREAM 项目在 2008 年发起了一项预测基因表达值的挑战,我们在此展示表现最佳的算法。
方法/主要发现:我们通过探索具有不同模型选择过程的各种回归方案来开发算法。事实证明,最有效的方案基于最小二乘法,具有最近开发的一种称为“弹性网络”的惩罚项。算法的关键组成部分是整合来自挑战中未呈现的其他实验条件的表达数据,以及利用转录因子结合数据来引导推断过程朝向已知相互作用。同样重要的是交叉验证过程,其中每种形式的外部数据仅在增加预期性能的程度上使用。
结论/意义:我们的算法证明了从大规模表达数据中提取信息以预测基因水平的可能性,以及整合不同数据源以改进推断的好处。我们认为前者对于那些仍对计算方法的可能性持怀疑态度的人来说是一个重要的信息,而后者是计算系统生物学领域未来发展的重要途径之一。
相似文献
PLoS Comput Biol. 2013-11-21
PLoS One. 2012-2-20
BMC Bioinformatics. 2009-2-7
引用本文的文献
Proc Natl Acad Sci U S A. 2017-9-5
Front Microbiol. 2016-3-31
Front Bioeng Biotechnol. 2014-5-20
Bioinformatics. 2014-6-15
本文引用的文献
IET Syst Biol. 2009-7
Ann N Y Acad Sci. 2009-3
Ann N Y Acad Sci. 2009-3
Nat Chem Biol. 2008-11
IET Syst Biol. 2007-5