Perez-Iratxeta Carolina, Astola Nagore, Ciccarelli Francesca D, Sha Parantu K, Bork Peer, Andrade Miguel A
European Molecular Biology Laboratory, Heidelberg, Germany.
Appl Bioinformatics. 2003;2(3):189-91.
Entries in biological databases are usually linked to scientific references. To generate those links and to keep them up-to-date, database maintainers have to continuously scan the scientific literature to select references that are relevant for each single database entry. The continuous growth of both the corpus of scientific literature and the size of biological databases makes this task very hard. We present a protocol intended to assist the updating of an existing set of literature (abstract) links from a single database entry with new references. It consists of taking the set of MEDLINE neighbour references of the existing linked abstracts and evaluating their relevance according to the existing set of abstracts. To test the applicability of the algorithm, we did a simple benchmark of the system using the references associated with the entries of a protein domain database. Human experts found the references that the algorithm scored highly were more relevant to the database entry than those scored lowly, suggesting that the algorithm was useful.
生物数据库中的条目通常与科学参考文献相关联。为了生成这些链接并使其保持最新状态,数据库维护人员必须持续扫描科学文献,以选择与每个数据库条目相关的参考文献。科学文献库和生物数据库规模的不断增长使得这项任务变得非常艰巨。我们提出了一种协议,旨在帮助用新参考文献更新来自单个数据库条目的现有文献(摘要)链接集。该协议包括获取现有链接摘要的MEDLINE相邻参考文献集,并根据现有摘要集评估它们的相关性。为了测试该算法的适用性,我们使用与蛋白质结构域数据库条目不相关的参考文献对该系统进行了简单的基准测试。人类专家发现,该算法评分高的参考文献比评分低的参考文献与数据库条目更相关,这表明该算法是有用的。