基于知识的生物医学文献概念化方法。

Tsinghua-Southampton Web Science Laboratory at Shenzhen, Graduate School at Shenzhen, Tsinghua University, Shenzhen, China.

Artif Intell Med. 2010 Jun;49(2):67-78. doi: 10.1016/j.artmed.2010.02.005. Epub 2010 Apr 3.

OBJECTIVE

Biomedical document conceptualization is the process of clustering biomedical documents based on ontology-represented domain knowledge. The result of this process is the representation of the biomedical documents by a set of key concepts and their relationships. Most of clustering methods cluster documents based on invariant domain knowledge. The objective of this work is to develop an effective method to cluster biomedical documents based on various user-specified ontologies, so that users can exploit the concept structures of documents more effectively.

METHODS

We develop a flexible framework to allow users to specify the knowledge bases, in the form of ontologies. Based on the user-specified ontologies, we develop a key concept induction algorithm, which uses latent semantic analysis to identify key concepts and cluster documents. A corpus-related ontology generation algorithm is developed to generate the concept structures of documents.

RESULTS

Based on two biomedical datasets, we evaluate the proposed method and five other clustering algorithms. The clustering results of the proposed method outperform the five other algorithms, in terms of key concept identification. With respect to the first biomedical dataset, our method has the F-measure values 0.7294 and 0.5294 based on the MeSH ontology and gene ontology (GO), respectively. With respect to the second biomedical dataset, our method has the F-measure values 0.6751 and 0.6746 based on the MeSH ontology and GO, respectively. Both results outperforms the five other algorithms in terms of F-measure. Based on the MeSH ontology and GO, the generated corpus-related ontologies show informative conceptual structures.

CONCLUSIONS

The proposed method enables users to specify the domain knowledge to exploit the conceptual structures of biomedical document collections. In addition, the proposed method is able to extract the key concepts and cluster the documents with a relatively high precision.

目的

生物医学文献概念化是根据本体表示的领域知识对生物医学文献进行聚类的过程。该过程的结果是通过一组关键概念及其关系来表示生物医学文献。大多数聚类方法都是基于不变的领域知识对文档进行聚类。本工作的目的是开发一种有效的方法，根据各种用户指定的本体对生物医学文献进行聚类，以便用户更有效地利用文档的概念结构。

方法

我们开发了一个灵活的框架，允许用户以本体的形式指定知识库。基于用户指定的本体，我们开发了一个关键概念归纳算法，该算法使用潜在语义分析来识别关键概念并对文档进行聚类。开发了一个与语料库相关的本体生成算法来生成文档的概念结构。

结果

基于两个生物医学数据集，我们评估了所提出的方法和其他五种聚类算法。在关键概念识别方面，所提出的方法的聚类结果优于其他五种算法。对于第一个生物医学数据集，我们的方法在基于 MeSH 本体和基因本体（GO）的情况下，F-measure 值分别为 0.7294 和 0.5294。对于第二个生物医学数据集，我们的方法在基于 MeSH 本体和 GO 的情况下，F-measure 值分别为 0.6751 和 0.6746。这两个结果在 F-measure 方面都优于其他五种算法。基于 MeSH 本体和 GO，生成的语料库相关本体显示出信息丰富的概念结构。

结论

所提出的方法使用户能够指定领域知识来利用生物医学文献集合的概念结构。此外，所提出的方法能够以相对较高的精度提取关键概念并对文档进行聚类。

相似文献

A knowledge-driven approach to biomedical document conceptualization.

Artif Intell Med. 2010 Jun;49(2):67-78. doi: 10.1016/j.artmed.2010.02.005. Epub 2010 Apr 3.

GOClonto: an ontological clustering approach for conceptualizing PubMed abstracts.

J Biomed Inform. 2010 Feb;43(1):31-40. doi: 10.1016/j.jbi.2009.07.006. Epub 2009 Jul 25.

A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora.

J Biomed Inform. 2010 Dec;43(6):1020-35. doi: 10.1016/j.jbi.2010.09.008. Epub 2010 Sep 24.

Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity.

Bioinformatics. 2009 Aug 1;25(15):1944-51. doi: 10.1093/bioinformatics/btp338. Epub 2009 Jun 3.

Recognizing names in biomedical texts: a machine learning approach.

Bioinformatics. 2004 May 1;20(7):1178-90. doi: 10.1093/bioinformatics/bth060. Epub 2004 Feb 10.

The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text.

J Biomed Inform. 2003 Dec;36(6):462-77. doi: 10.1016/j.jbi.2003.11.003.

PuReD-MCL: a graph-based PubMed document clustering methodology.

Bioinformatics. 2008 Sep 1;24(17):1935-41. doi: 10.1093/bioinformatics/btn318. Epub 2008 Jul 1.

Dynamic sub-ontology evolution for traditional Chinese medicine web ontology.

J Biomed Inform. 2008 Oct;41(5):790-805. doi: 10.1016/j.jbi.2008.05.008. Epub 2008 May 23.

Query expansion with a medical ontology to improve a multimodal information retrieval system.

Comput Biol Med. 2009 Apr;39(4):396-403. doi: 10.1016/j.compbiomed.2009.01.012. Epub 2009 Mar 6.

Automatic extension of Gene Ontology with flexible identification of candidate terms.

Bioinformatics. 2006 Mar 15;22(6):665-70. doi: 10.1093/bioinformatics/btl010. Epub 2006 Jan 21.

引用本文的文献

The role of a multicentre data repository in ocular inflammation: The Ocular Autoimmune Systemic Inflammatory Infectious Study (OASIS).

Eye (Lond). 2023 Oct;37(15):3084-3096. doi: 10.1038/s41433-023-02472-5. Epub 2023 Mar 14.

Mapping biological entities using the longest approximately common prefix method.

BMC Bioinformatics. 2014 Jun 14;15:187. doi: 10.1186/1471-2105-15-187.

Towards semantic search and inference in electronic medical records: An approach using concept--based information retrieval.

Australas Med J. 2012;5(9):482-8. doi: 10.4066/AMJ.2012.1362. Epub 2012 Sep 30.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

A knowledge-driven approach to biomedical document conceptualization.

Artif Intell Med. 2010 Jun;49(2):67-78. doi: 10.1016/j.artmed.2010.02.005. Epub 2010 Apr 3.

GOClonto: an ontological clustering approach for conceptualizing PubMed abstracts.

J Biomed Inform. 2010 Feb;43(1):31-40. doi: 10.1016/j.jbi.2009.07.006. Epub 2009 Jul 25.

A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora.

J Biomed Inform. 2010 Dec;43(6):1020-35. doi: 10.1016/j.jbi.2010.09.008. Epub 2010 Sep 24.

Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity.

Bioinformatics. 2009 Aug 1;25(15):1944-51. doi: 10.1093/bioinformatics/btp338. Epub 2009 Jun 3.

Recognizing names in biomedical texts: a machine learning approach.

Bioinformatics. 2004 May 1;20(7):1178-90. doi: 10.1093/bioinformatics/bth060. Epub 2004 Feb 10.

The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text.

J Biomed Inform. 2003 Dec;36(6):462-77. doi: 10.1016/j.jbi.2003.11.003.

PuReD-MCL: a graph-based PubMed document clustering methodology.

Bioinformatics. 2008 Sep 1;24(17):1935-41. doi: 10.1093/bioinformatics/btn318. Epub 2008 Jul 1.

Dynamic sub-ontology evolution for traditional Chinese medicine web ontology.

J Biomed Inform. 2008 Oct;41(5):790-805. doi: 10.1016/j.jbi.2008.05.008. Epub 2008 May 23.

Query expansion with a medical ontology to improve a multimodal information retrieval system.

Comput Biol Med. 2009 Apr;39(4):396-403. doi: 10.1016/j.compbiomed.2009.01.012. Epub 2009 Mar 6.

Automatic extension of Gene Ontology with flexible identification of candidate terms.

Bioinformatics. 2006 Mar 15;22(6):665-70. doi: 10.1093/bioinformatics/btl010. Epub 2006 Jan 21.

引用本文的文献

The role of a multicentre data repository in ocular inflammation: The Ocular Autoimmune Systemic Inflammatory Infectious Study (OASIS).

Eye (Lond). 2023 Oct;37(15):3084-3096. doi: 10.1038/s41433-023-02472-5. Epub 2023 Mar 14.

Mapping biological entities using the longest approximately common prefix method.

BMC Bioinformatics. 2014 Jun 14;15:187. doi: 10.1186/1471-2105-15-187.

Towards semantic search and inference in electronic medical records: An approach using concept--based information retrieval.

Australas Med J. 2012;5(9):482-8. doi: 10.4066/AMJ.2012.1362. Epub 2012 Sep 30.

A knowledge-driven approach to biomedical document conceptualization.

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献