Bioinformatics Graduate Program, and Department of Biomedical Engineering, Boston University, 44 Cummington Street, Boston, MA 02215, USA.
BMC Bioinformatics. 2012 Mar 23;13:46. doi: 10.1186/1471-2105-13-46.
Identification of active causal regulators is a crucial problem in understanding mechanism of diseases or finding drug targets. Methods that infer causal regulators directly from primary data have been proposed and successfully validated in some cases. These methods necessarily require very large sample sizes or a mix of different data types. Recent studies have shown that prior biological knowledge can successfully boost a method's ability to find regulators.
We present a simple data-driven method, Correlation Set Analysis (CSA), for comprehensively detecting active regulators in disease populations by integrating co-expression analysis and a specific type of literature-derived causal relationships. Instead of investigating the co-expression level between regulators and their regulatees, we focus on coherence of regulatees of a regulator. Using simulated datasets we show that our method performs very well at recovering even weak regulatory relationships with a low false discovery rate. Using three separate real biological datasets we were able to recover well known and as yet undescribed, active regulators for each disease population. The results are represented as a rank-ordered list of regulators, and reveals both single and higher-order regulatory relationships.
CSA is an intuitive data-driven way of selecting directed perturbation experiments that are relevant to a disease population of interest and represent a starting point for further investigation. Our findings demonstrate that combining co-expression analysis on regulatee sets with a literature-derived network can successfully identify causal regulators and help develop possible hypothesis to explain disease progression.
识别活跃的因果调节因子是理解疾病机制或寻找药物靶点的关键问题。已经提出了一些从原始数据中直接推断因果调节因子的方法,并在某些情况下得到了成功验证。这些方法必然需要非常大的样本量或混合不同的数据类型。最近的研究表明,先验的生物学知识可以成功地提高方法发现调节因子的能力。
我们提出了一种简单的数据驱动方法,即相关集分析(CSA),通过整合共表达分析和特定类型的文献衍生因果关系,全面检测疾病人群中的活跃调节因子。我们不是研究调节因子与其调节物之间的共表达水平,而是关注调节因子的调节物的一致性。使用模拟数据集,我们表明我们的方法在以低假发现率恢复甚至较弱的调节关系方面表现非常出色。使用三个独立的真实生物学数据集,我们能够为每个疾病人群恢复已知和尚未描述的活跃调节因子。结果以调节因子的排序列表表示,揭示了单级和更高级别的调节关系。
CSA 是一种直观的数据驱动方法,用于选择与感兴趣的疾病人群相关的定向扰动实验,并为进一步研究提供起点。我们的研究结果表明,将调节物集的共表达分析与文献衍生的网络相结合,可以成功识别因果调节因子,并有助于提出可能的假设来解释疾病进展。