Adapt Centre and School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland.
School of Medicine, Trinity College Dublin, Dublin, Ireland.
BMC Bioinformatics. 2024 Sep 16;25(1):303. doi: 10.1186/s12859-024-05881-9.
Literature-based discovery (LBD) aims to help researchers to identify relations between concepts which are worthy of further investigation by text-mining the biomedical literature. While the LBD literature is rich and the field is considered mature, standard practice in the evaluation of LBD methods is methodologically poor and has not progressed on par with the domain. The lack of properly designed and decent-sized benchmark dataset hinders the progress of the field and its development into applications usable by biomedical experts.
This work presents a method for mining past discoveries from the biomedical literature. It leverages the impact made by a discovery, using descriptive statistics to detect surges in the prevalence of a relation across time. The validity of the method is tested against a baseline representing the state-of-the-art "time-sliced" method.
This method allows the collection of a large amount of time-stamped discoveries. These can be used for LBD evaluation, alleviating the long-standing issue of inadequate evaluation. It might also pave the way for more fine-grained LBD methods, which could exploit the diversity of these past discoveries to train supervised models. Finally the dataset (or some future version of it inspired by our method) could be used as a methodological tool for systematic reviews. We provide an online exploration tool in this perspective, available at https://brainmend.adaptcentre.ie/ .
基于文献的发现(LBD)旨在通过对生物医学文献进行文本挖掘,帮助研究人员识别值得进一步研究的概念之间的关系。尽管 LBD 文献丰富,并且该领域被认为已经成熟,但 LBD 方法的评估标准实践在方法上存在缺陷,并没有与该领域同步发展。缺乏经过适当设计和具有一定规模的基准数据集阻碍了该领域的发展及其向可被生物医学专家使用的应用程序的发展。
这项工作提出了一种从生物医学文献中挖掘过去发现的方法。它利用发现的影响力,使用描述性统计来检测一段时间内关系的流行度的变化。该方法的有效性通过基线进行测试,基线代表了最先进的“时间切片”方法。
该方法允许收集大量带有时间戳的发现。这些发现可用于 LBD 评估,从而缓解评估不足的长期问题。它也可能为更细粒度的 LBD 方法铺平道路,这些方法可以利用这些过去发现的多样性来训练监督模型。最后,该数据集(或受我们方法启发的未来版本)可以作为系统综述的方法工具。我们提供了一个在线探索工具,可在 https://brainmend.adaptcentre.ie/ 上使用。