Suppr超能文献

GO2PUB:利用基因本体术语的语义扩展查询PubMed

GO2PUB: Querying PubMed with semantic expansion of gene ontology terms.

作者信息

Bettembourg Charles, Diot Christian, Burgun Anita, Dameron Olivier

机构信息

UMR936, INSERM, Université de Rennes 1, 2 av, Léon Bernard, F-35043 Rennes, France.

出版信息

J Biomed Semantics. 2012 Sep 7;3(1):7. doi: 10.1186/2041-1480-3-7.

Abstract

BACKGROUND

With the development of high throughput methods of gene analyses, there is a growing need for mining tools to retrieve relevant articles in PubMed. As PubMed grows, literature searches become more complex and time-consuming. Automated search tools with good precision and recall are necessary. We developed GO2PUB to automatically enrich PubMed queries with gene names, symbols and synonyms annotated by a GO term of interest or one of its descendants.

RESULTS

GO2PUB enriches PubMed queries based on selected GO terms and keywords. It processes the result and displays the PMID, title, authors, abstract and bibliographic references of the articles. Gene names, symbols and synonyms that have been generated as extra keywords from the GO terms are also highlighted. GO2PUB is based on a semantic expansion of PubMed queries using the semantic inheritance between terms through the GO graph. Two experts manually assessed the relevance of GO2PUB, GoPubMed and PubMed on three queries about lipid metabolism. Experts' agreement was high (kappa = 0.88). GO2PUB returned 69% of the relevant articles, GoPubMed: 40% and PubMed: 29%. GO2PUB and GoPubMed have 17% of their results in common, corresponding to 24% of the total number of relevant results. 70% of the articles returned by more than one tool were relevant. 36% of the relevant articles were returned only by GO2PUB, 17% only by GoPubMed and 14% only by PubMed. For determining whether these results can be generalized, we generated twenty queries based on random GO terms with a granularity similar to those of the first three queries and compared the proportions of GO2PUB and GoPubMed results. These were respectively of 77% and 40% for the first queries, and of 70% and 38% for the random queries. The two experts also assessed the relevance of seven of the twenty queries (the three related to lipid metabolism and four related to other domains). Expert agreement was high (0.93 and 0.8). GO2PUB and GoPubMed performances were similar to those of the first queries.

CONCLUSIONS

We demonstrated that the use of genes annotated by either GO terms of interest or a descendant of these GO terms yields some relevant articles ignored by other tools. The comparison of GO2PUB, based on semantic expansion, with GoPubMed, based on text mining techniques, showed that both tools are complementary. The analysis of the randomly-generated queries suggests that the results obtained about lipid metabolism can be generalized to other biological processes. GO2PUB is available at http://go2pub.genouest.org.

摘要

背景

随着基因分析高通量方法的发展,越来越需要挖掘工具来检索PubMed中的相关文章。随着PubMed的不断增长,文献检索变得更加复杂和耗时。因此,需要具备良好精确率和召回率的自动检索工具。我们开发了GO2PUB,以利用感兴趣的GO术语或其后代注释的基因名称、符号和同义词自动丰富PubMed查询。

结果

GO2PUB基于选定的GO术语和关键词丰富PubMed查询。它处理结果并显示文章的 PMID、标题、作者、摘要和参考文献。从GO术语生成的作为额外关键词的基因名称、符号和同义词也会突出显示。GO2PUB基于通过GO图中术语之间的语义继承对PubMed查询进行语义扩展。两位专家手动评估了GO2PUB、GoPubMed和PubMed在三个关于脂质代谢的查询上的相关性。专家之间的一致性很高(kappa = 0.88)。GO2PUB返回了69%的相关文章,GoPubMed为40%,PubMed为29%。GO2PUB和GoPubMed有17%的结果相同,占相关结果总数的24%。超过一种工具返回的文章中有70%是相关的。36%的相关文章仅由GO2PUB返回,17%仅由GoPubMed返回,14%仅由PubMed返回。为了确定这些结果是否可以推广,我们基于随机的GO术语生成了二十个查询,其粒度与前三个查询相似,并比较了GO2PUB和GoPubMed结果的比例。对于最初的查询,这些比例分别为77%和40%,对于随机查询,分别为70%和38%。两位专家还评估了二十个查询中的七个(三个与脂质代谢相关,四个与其他领域相关)的相关性。专家之间的一致性很高(0.93和0.8)。GO2PUB和GoPubMed的性能与最初的查询相似。

结论

我们证明,使用由感兴趣的GO术语或这些GO术语的后代注释的基因会产生一些其他工具忽略的相关文章。基于语义扩展的GO2PUB与基于文本挖掘技术的GoPubMed的比较表明,这两种工具是互补的。对随机生成的查询的分析表明,关于脂质代谢获得的结果可以推广到其他生物过程。GO2PUB可在http://go2pub.genouest.org上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/793a/3599846/b04cacb3a389/2041-1480-3-7-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验