University of Applied Sciences and Arts of Western Switzerland (HES-SO).
Swiss Institute of Bioinformatics, Switzerland.
Stud Health Technol Inform. 2022 May 25;294:876-877. doi: 10.3233/SHTI220614.
We present an analysis of supplementary materials of PubMed Central (PMC) articles and show their importance in indexing and searching biomedical literature, in particular for the emerging genomic medicine field. On a subset of articles from PubMed Central, we use text mining methods to extract MeSH terms from abstracts, full texts, and text-based supplementary materials. We find that the recall of MeSH annotations increases by about 5.9 percentage points (+20% on relative percentage) when considering supplementary materials compared to using only abstracts. We further compare the supplementary material annotations with full-text annotations and we find out that the recall of MeSH terms increases by 1.5 percentage point (+3% on relative percentage). Additionally, we analyze genetic variant mentions in abstracts and full-texts and compare them with mentions found in supplementary text-based files. We find that the majority (about 99%) of variants are found in text-based supplementary files. In conclusion, we suggest that supplementary data should receive more attention from the information retrieval community, in particular in life and health sciences.
我们分析了 PubMed Central(PMC)文章的补充材料,并展示了它们在索引和搜索生物医学文献方面的重要性,特别是在新兴的基因组医学领域。在 PMC 的一部分文章中,我们使用文本挖掘方法从摘要、全文和基于文本的补充材料中提取 MeSH 术语。我们发现,与仅使用摘要相比,考虑补充材料时,MeSH 注释的召回率增加了约 5.9 个百分点(相对百分比增加 20%)。我们进一步比较了补充材料的注释和全文的注释,发现 MeSH 术语的召回率增加了 1.5 个百分点(相对百分比增加 3%)。此外,我们分析了摘要和全文中的遗传变异提及,并将其与在补充基于文本的文件中发现的提及进行了比较。我们发现,约 99%的变异是在基于文本的补充文件中发现的。总之,我们建议补充数据应得到信息检索界的更多关注,特别是在生命和健康科学领域。