Hoffmann Robert, Krallinger Martin, Andres Eduardo, Tamames Javier, Blaschke Christian, Valencia Alfonso
National Center for Biotechnology, CNB-CSIC, Darwin 3, Cantoblanco, 28049 Madrid, Spain.
Sci STKE. 2005 May 10;2005(283):pe21. doi: 10.1126/stke.2832005pe21.
The complexity of the information stored in databases and publications on metabolic and signaling pathways, the high throughput of experimental data, and the growing number of publications make it imperative to provide systems to help the researcher navigate through these interrelated information resources. Text-mining methods have started to play a key role in the creation and maintenance of links between the information stored in biological databases and its original sources in the literature. These links will be extremely useful for database updating and curation, especially if a number of technical problems can be solved satisfactorily, including the identification of protein and gene names (entities in general) and the characterization of their types of interactions. The first generation of openly accessible text-mining systems, such as iHOP (Information Hyperlinked over Proteins), provides additional functions to facilitate the reconstruction of protein interaction networks, combine database and text information, and support the scientist in the formulation of novel hypotheses. The next challenge is the generation of comprehensive information regarding the general function of signaling pathways and protein interaction networks.
存储在代谢和信号通路数据库及出版物中的信息十分复杂,实验数据产量高,出版物数量不断增加,因此必须提供系统来帮助研究人员浏览这些相互关联的信息资源。文本挖掘方法已开始在生物数据库中存储的信息与其文献原始来源之间建立和维护链接方面发挥关键作用。这些链接对于数据库更新和管理极为有用,尤其是在一些技术问题能够得到圆满解决的情况下,包括蛋白质和基因名称(一般实体)的识别及其相互作用类型的表征。第一代可公开访问的文本挖掘系统,如iHOP(蛋白质信息超链接),提供了额外功能,以促进蛋白质相互作用网络的重建、结合数据库和文本信息,并支持科学家提出新假设。下一个挑战是生成有关信号通路和蛋白质相互作用网络总体功能的全面信息。