Wang Zichen, Clark Neil R, Ma'ayan Avi
Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, NY, 10029, USA.
BD2K-LINCS Data Coordination and Integration Center, New York, USA.
BMC Syst Biol. 2015 Jun 6;9:26. doi: 10.1186/s12918-015-0173-z.
Thousands of biological and biomedical investigators study of the functional role of single genes and their protein products in normal physiology and in disease. The findings from these studies are reported in research articles that stimulate new research. It is now established that a complex regulatory networks's is controlling human cellular fate, and this community of researchers are continually unraveling this network topology. Attempts to integrate results from such accumulated knowledge resulted in literature-based protein-protein interaction networks (PPINs) and pathway databases. These databases are widely used by the community to analyze new data collected from emerging genome-wide studies with the assumption that the data within these literature-based databases is the ground truth and contain no biases. While suspicion for research focus biases is growing, a concrete proof for it is still missing. It is difficult to prove because the real PPINs are mostly unknown.
Here we analyzed the longitudinal discovery process of literature-based mammalian and yeast PPINs to observe that these networks are discovered non-uniformly. The pattern of discovery is related to a theoretical concept proposed by Kauffman called "expanding the adjacent possible". We introduce a network discovery model which explicitly includes the space of possibilities in the form of a true underlying PPIN.
Our model strongly suggests that research focus biases exist in the observed discovery dynamics of these networks. In summary, more care should be placed when using PPIN databases for analysis of newly acquired data, and when considering prior knowledge when designing new experiments.
数以千计的生物学和生物医学研究人员在研究单个基因及其蛋白质产物在正常生理和疾病中的功能作用。这些研究的结果发表在激发新研究的科研文章中。现已确定,一个复杂的调控网络正在控制人类细胞命运,并且这个研究群体正在不断揭示该网络的拓扑结构。整合这些积累知识的结果的尝试产生了基于文献的蛋白质-蛋白质相互作用网络(PPINs)和通路数据库。该领域广泛使用这些数据库来分析从新兴的全基因组研究中收集的新数据,其假设是这些基于文献的数据库中的数据是基本事实且不存在偏差。虽然对研究重点偏差的怀疑在增加,但仍缺乏确凿证据。由于真正的PPIN大多未知,所以很难证明。
在此,我们分析了基于文献的哺乳动物和酵母PPINs的纵向发现过程,以观察到这些网络的发现是不均匀的。发现模式与考夫曼提出的一个理论概念“扩展相邻可能”有关。我们引入了一个网络发现模型,该模型以真实的潜在PPIN的形式明确包含了可能性空间。
我们的模型有力地表明,在这些网络的观察到的发现动态中存在研究重点偏差。总之,在使用PPIN数据库分析新获取的数据以及在设计新实验时考虑先验知识时,应更加谨慎。