Suppr超能文献

利用数学优化从多组学数据集中估计全基因组调控活性。

Estimating genome-wide regulatory activity from multi-omics data sets using mathematical optimization.

作者信息

Trescher Saskia, Münchmeyer Jannes, Leser Ulf

机构信息

Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany.

出版信息

BMC Syst Biol. 2017 Mar 27;11(1):41. doi: 10.1186/s12918-017-0419-z.

Abstract

BACKGROUND

Gene regulation is one of the most important cellular processes, indispensable for the adaptability of organisms and closely interlinked with several classes of pathogenesis and their progression. Elucidation of regulatory mechanisms can be approached by a multitude of experimental methods, yet integration of the resulting heterogeneous, large, and noisy data sets into comprehensive and tissue or disease-specific cellular models requires rigorous computational methods. Recently, several algorithms have been proposed which model genome-wide gene regulation as sets of (linear) equations over the activity and relationships of transcription factors, genes and other factors. Subsequent optimization finds those parameters that minimize the divergence of predicted and measured expression intensities. In various settings, these methods produced promising results in terms of estimating transcription factor activity and identifying key biomarkers for specific phenotypes. However, despite their common root in mathematical optimization, they vastly differ in the types of experimental data being integrated, the background knowledge necessary for their application, the granularity of their regulatory model, the concrete paradigm used for solving the optimization problem and the data sets used for evaluation.

RESULTS

Here, we review five recent methods of this class in detail and compare them with respect to several key properties. Furthermore, we quantitatively compare the results of four of the presented methods based on publicly available data sets.

CONCLUSIONS

The results show that all methods seem to find biologically relevant information. However, we also observe that the mutual result overlaps are very low, which contradicts biological intuition. Our aim is to raise further awareness of the power of these methods, yet also to identify common shortcomings and necessary extensions enabling focused research on the critical points.

摘要

背景

基因调控是最重要的细胞过程之一,对于生物体的适应性不可或缺,并且与多种发病机制及其进展密切相关。阐明调控机制可以通过多种实验方法来实现,然而,将由此产生的异质、庞大且有噪声的数据集整合到全面的、针对组织或疾病的细胞模型中需要严格的计算方法。最近,已经提出了几种算法,这些算法将全基因组基因调控建模为转录因子、基因和其他因子的活性及相互关系的(线性)方程组。随后的优化找到那些能使预测表达强度和测量表达强度的差异最小化的参数。在各种情况下,这些方法在估计转录因子活性和识别特定表型的关键生物标志物方面都取得了有前景的结果。然而,尽管它们都源于数学优化,但在整合的实验数据类型、应用所需的背景知识、调控模型的粒度、用于解决优化问题的具体范式以及用于评估的数据集等方面存在很大差异。

结果

在这里,我们详细回顾了这类最近的五种方法,并就几个关键特性对它们进行了比较。此外,我们基于公开可用的数据集对其中四种方法的结果进行了定量比较。

结论

结果表明,所有方法似乎都找到了生物学相关信息。然而,我们也观察到相互的结果重叠非常低,这与生物学直觉相矛盾。我们的目的是进一步提高对这些方法威力的认识,同时也识别常见的缺点和必要的扩展,以便能够针对关键点进行有重点的研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0177/5369021/1c0b3a3aa1d9/12918_2017_419_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验