Canada's Michael Smith Genome Sciences Centre, Vancouver, BC V5Z 4S6, Canada.
University of British Columbia, Vancouver, BC V6T 1Z1, Canada.
Bioinformatics. 2018 Feb 15;34(4):652-659. doi: 10.1093/bioinformatics/btx613.
The increase in publication rates makes it challenging for an individual researcher to stay abreast of all relevant research in order to find novel research hypotheses. Literature-based discovery methods make use of knowledge graphs built using text mining and can infer future associations between biomedical concepts that will likely occur in new publications. These predictions are a valuable resource for researchers to explore a research topic. Current methods for prediction are based on the local structure of the knowledge graph. A method that uses global knowledge from across the knowledge graph needs to be developed in order to make knowledge discovery a frequently used tool by researchers.
We propose an approach based on the singular value decomposition (SVD) that is able to combine data from across the knowledge graph through a reduced representation. Using cooccurrence data extracted from published literature, we show that SVD performs better than the leading methods for scoring discoveries. We also show the diminishing predictive power of knowledge discovery as we compare our predictions with real associations that appear further into the future. Finally, we examine the strengths and weaknesses of the SVD approach against another well-performing system using several predicted associations.
All code and results files for this analysis can be accessed at https://github.com/jakelever/knowledgediscovery.
Supplementary data are available at Bioinformatics online.
出版物数量的增加使得个体研究人员难以跟上所有相关研究,从而找到新的研究假设。基于文献的发现方法利用使用文本挖掘构建的知识图,并可以推断新出版物中可能出现的生物医学概念之间的未来关联。这些预测为研究人员探索研究课题提供了有价值的资源。目前的预测方法基于知识图的局部结构。需要开发一种利用知识图全局知识的方法,以便使知识发现成为研究人员经常使用的工具。
我们提出了一种基于奇异值分解(SVD)的方法,能够通过简化表示来组合来自知识图各个部分的数据。我们使用从已发表文献中提取的共现数据,表明 SVD 在评分发现方面优于领先的方法。随着我们将预测与未来更久远的真实关联进行比较,我们还展示了知识发现的预测能力逐渐减弱。最后,我们使用几个预测的关联来检查 SVD 方法相对于另一个表现良好的系统的优缺点。
此分析的所有代码和结果文件都可以在 https://github.com/jakelever/knowledgediscovery 上访问。
补充数据可在生物信息学在线获得。