Krauthammer Michael, Kra Pauline, Iossifov Ivan, Gomez Shawn M, Hripcsak George, Hatzivassiloglou Vasileios, Friedman Carol, Rzhetsky Andrey
Department of Medical Informatics, Columbia University, New York, NY 10032, USA.
Bioinformatics. 2002;18 Suppl 1:S249-57. doi: 10.1093/bioinformatics/18.suppl_1.s249.
Knowledge on interactions between molecules in living cells is indispensable for theoretical analysis and practical applications in modern genomics and molecular biology. Building such networks relies on the assumption that the correct molecular interactions are known or can be identified by reading a few research articles. However, this assumption does not necessarily hold, as truth is rather an emerging property based on many potentially conflicting facts. This paper explores the processes of knowledge generation and publishing in the molecular biology literature using modelling and analysis of real molecular interaction data. The data analysed in this article were automatically extracted from 50000 research articles in molecular biology using a computer system called GeneWays containing a natural language processing module. The paper indicates that truthfulness of statements is associated in the minds of scientists with the relative importance (connectedness) of substances under study, revealing a potential selection bias in the reporting of research results. Aiming at understanding the statistical properties of the life cycle of biological facts reported in research articles, we formulate a stochastic model describing generation and propagation of knowledge about molecular interactions through scientific publications. We hope that in the future such a model can be useful for automatically producing consensus views of molecular interaction data.
了解活细胞中分子间的相互作用对于现代基因组学和分子生物学的理论分析及实际应用而言不可或缺。构建此类网络依赖于这样一种假设,即正确的分子相互作用是已知的,或者可以通过阅读几篇研究文章来识别。然而,这一假设不一定成立,因为真相实际上是基于许多可能相互矛盾的事实而产生的一种属性。本文利用对真实分子相互作用数据的建模与分析,探讨了分子生物学文献中的知识生成与发表过程。本文所分析的数据是使用一个名为GeneWays的包含自然语言处理模块的计算机系统,从50000篇分子生物学研究文章中自动提取的。本文指出,在科学家的认知中,陈述的真实性与所研究物质的相对重要性(关联性)相关,这揭示了研究结果报告中存在潜在的选择偏差。为了理解研究文章中所报告的生物学事实生命周期的统计特性,我们构建了一个随机模型,描述通过科学出版物产生和传播分子相互作用知识的过程。我们希望未来这样的模型能够有助于自动生成分子相互作用数据的共识观点。