Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada.
BMC Bioinformatics. 2012 Sep 27;13:249. doi: 10.1186/1471-2105-13-249.
MEDLINE®/PubMed® indexes over 20 million biomedical articles, providing curated annotation of its contents using a controlled vocabulary known as Medical Subject Headings (MeSH). The MeSH vocabulary, developed over 50+ years, provides a broad coverage of topics across biomedical research. Distilling the essential biomedical themes for a topic of interest from the relevant literature is important to both understand the importance of related concepts and discover new relationships.
We introduce a novel method for determining enriched curator-assigned MeSH annotations in a set of papers associated to a topic, such as a gene, an author or a disease. We generate MeSH Over-representation Profiles (MeSHOPs) to quantitatively summarize the annotations in a form convenient for further computational analysis and visualization. Based on a hypergeometric distribution of assigned terms, MeSHOPs statistically account for the prevalence of the associated biomedical annotation while highlighting unusually prevalent terms based on a specified background. MeSHOPs can be visualized using word clouds, providing a succinct quantitative graphical representation of the relative importance of terms. Using the publication dates of articles, MeSHOPs track changing patterns of annotation over time. Since MeSHOPs are quantitative vectors, MeSHOPs can be compared using standard techniques such as hierarchical clustering. The reliability of MeSHOP annotations is assessed based on the capacity to re-derive the subset of the Gene Ontology annotations with equivalent MeSH terms.
MeSHOPs allows quantitative measurement of the degree of association between any entity and the annotated medical concepts, based directly on relevant primary literature. Comparison of MeSHOPs allows entities to be related based on shared medical themes in their literature. A web interface is provided for generating and visualizing MeSHOPs.
MEDLINE®/PubMed® 索引了超过 2000 万篇生物医学文章,使用称为医学主题词 (MeSH) 的受控词汇表对其内容进行精心注释。MeSH 词汇表经过 50 多年的发展,提供了对生物医学研究各个主题的广泛覆盖。从相关文献中提取出与感兴趣的主题相关的基本生物医学主题,对于理解相关概念的重要性和发现新的关系都很重要。
我们介绍了一种新颖的方法,用于确定与主题(如基因、作者或疾病)相关的一组论文中丰富的编目分配 MeSH 注释。我们生成 MeSH 过度表达谱 (MeSHOPs) ,以定量总结注释形式,方便进一步的计算分析和可视化。基于分配术语的超几何分布,MeSHOPs 统计上考虑了相关生物医学注释的流行程度,同时根据指定的背景突出显示异常流行的术语。MeSHOPs 可以使用词云进行可视化,提供术语相对重要性的简洁定量图形表示。使用文章的出版日期,MeSHOPs 可以跟踪随时间变化的注释模式。由于 MeSHOPs 是定量向量,因此可以使用标准技术(如层次聚类)进行比较。MeSHOP 注释的可靠性基于重新推导出具有等效 MeSH 术语的基因本体论注释子集的能力来评估。
MeSHOPs 允许根据相关原始文献,直接对任何实体与注释的医学概念之间的关联程度进行定量测量。MeSHOPs 的比较允许基于其文献中的共享医学主题来关联实体。提供了一个网络界面,用于生成和可视化 MeSHOPs。