Systems Biology Lab, Center of Bioinformatics, Biostatistics and Integrative Biology, Institut Pasteur, Paris, France; Functional Genetics of Infectious Diseases Unit, Department Genomes and Genetics, Institut Pasteur, Paris, France; Université Paris-Descartes, Sorbonne Paris Cité, Paris, France.
Systems Biology Lab, Center of Bioinformatics, Biostatistics and Integrative Biology, Institut Pasteur, Paris, France.
Methods. 2018 Jan 1;132:19-25. doi: 10.1016/j.ymeth.2017.08.008. Epub 2017 Sep 21.
Biological processes often manifest themselves as coordinated changes across modules, i.e., sets of interacting genes. Commonly, the high dimensionality of genome-scale data prevents the visual identification of such modules, and straightforward computational search through a set of known pathways is a limited approach. Therefore, tools for the data-driven, computational, identification of modules in gene interaction networks have become popular components of visualization and visual analytics workflows. However, many such tools are known to result in modules that are large, and therefore hard to interpret biologically. Here, we show that the empirically known tendency towards large modules can be attributed to a statistical bias present in many module identification tools, and discuss possible remedies from a mathematical perspective. In the current absence of a straightforward practical solution, we outline our view of best practices for the use of the existing tools.
生物过程通常表现为模块之间的协调变化,即一组相互作用的基因。通常,基因组规模数据的高维性阻止了对这种模块的直观识别,并且通过一组已知途径进行直接的计算搜索是一种有限的方法。因此,用于基因相互作用网络中模块的基于数据的计算识别的工具已成为可视化和可视分析工作流程的流行组件。然而,众所周知,许多这样的工具会导致模块过大,因此难以从生物学角度进行解释。在这里,我们表明,在许多模块识别工具中存在的统计偏差可以归因于经验上已知的大型模块的趋势,并从数学角度讨论可能的补救措施。在目前没有直接实用解决方案的情况下,我们概述了我们对现有工具使用的最佳实践的看法。