Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Clayton, Victoria, 3800, Australia.
Otzma Analytics Pty Ltd, Bentleigh East, Victoria, Australia.
BMC Bioinformatics. 2020 Apr 29;21(1):165. doi: 10.1186/s12859-020-3441-x.
Network motifs are connectivity structures that occur with significantly higher frequency than chance, and are thought to play important roles in complex biological networks, for example in gene regulation, interactomes, and metabolomes. Network motifs may also become pivotal in the rational design and engineering of complex biological systems underpinning the field of synthetic biology. Distinguishing true motifs from arbitrary substructures, however, remains a challenge.
Here we demonstrate both theoretically and empirically that implicit assumptions present in mainstream methods for motif identification do not necessarily hold, with the ramification that motif studies using these mainstream methods are less able to effectively differentiate between spurious results and events of true statistical significance than is often presented. We show that these difficulties cannot be overcome without revising the methods of statistical analysis used to identify motifs.
Present-day methods for the discovery of network motifs, and, indeed, even the methods for defining what they are, are critically reliant on a set of incorrect assumptions, casting a doubt on the scientific validity of motif-driven discoveries. The implications of these findings are therefore far-reaching across diverse areas of biology.
网络基元是指出现频率明显高于随机的连接结构,被认为在复杂的生物网络中发挥着重要作用,例如在基因调控、互作组和代谢组中。网络基元也可能在合成生物学领域中对复杂生物系统的合理设计和工程中变得至关重要。然而,将真正的基元与任意子结构区分开来仍然是一个挑战。
本文从理论和实证两个方面证明,主流的基元识别方法中存在的隐含假设并不一定成立,其结果是,使用这些主流方法进行的基元研究,在区分虚假结果和真正具有统计意义的事件方面,不如通常所呈现的那样有效。我们表明,如果不修改用于识别基元的统计分析方法,这些困难就无法克服。
目前用于发现网络基元的方法,甚至用于定义它们的方法,都严重依赖于一组错误的假设,这使得基于基元的发现的科学有效性受到质疑。因此,这些发现的影响在生物学的各个领域都非常深远。