Preiss Judita, Stevenson Mark
Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield, UK.
BMC Bioinformatics. 2017 May 31;18(Suppl 7):249. doi: 10.1186/s12859-017-1641-9.
Literature based discovery (LBD) automatically infers missed connections between concepts in literature. It is often assumed that LBD generates more information than can be reasonably examined.
We present a detailed analysis of the quantity of hidden knowledge produced by an LBD system and the effect of various filtering approaches upon this. The investigation of filtering combined with single or multi-step linking term chains is carried out on all articles in PubMed.
The evaluation is carried out using both replication of existing discoveries, which provides justification for multi-step linking chain knowledge in specific cases, and using timeslicing, which gives a large scale measure of performance.
While the quantity of hidden knowledge generated by LBD can be vast, we demonstrate that (a) intelligent filtering can greatly reduce the number of hidden knowledge pairs generated, (b) for a specific term, the number of single step connections can be manageable, and
基于文献的发现(LBD)能自动推断文献中概念之间被遗漏的联系。人们通常认为LBD产生的信息过多,难以进行合理审查。
我们详细分析了一个LBD系统产生的隐藏知识的数量以及各种过滤方法对其的影响。在PubMed中的所有文章上进行了结合单步或多步链接词链的过滤研究。
评估既通过复制现有发现来进行,这为特定情况下的多步链接链知识提供了依据,也通过时间切片来进行,这给出了大规模的性能衡量。
虽然LBD产生的隐藏知识数量可能巨大,但我们证明:(a)智能过滤能大幅减少产生的隐藏知识对的数量;(b)对于特定术语,单步连接的数量是可控的,并且