Jimeno-Yepes Antonio, Berlanga-Llavori Rafael, Rebholz-Schuhmann Dietrich
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, U.K.
Annu Int Conf IEEE Eng Med Biol Soc. 2009;2009:7073-8. doi: 10.1109/IEMBS.2009.5333359.
Ontological resources such as controlled vocabularies, taxonomies and ontologies from the OBO foundry are used to represent biomedical domain knowledge. The development of such resources is a time consuming task. Once they are finished they contribute to standardization of information representation, interoperability of IT solutions, literature analysis and knowledge discovery. Text mining comprises IT solutions for information retrieval (IR) and information extraction (IE). IR technology exploits ontological resources to select documents that fit best to the processed query, for example, through indexing of the literature content with concept ids or through disambiguation of terms in the query. IE solutions make use of the ontological labels to identify concepts in the text. The text passages that denote conceptual entries are then used either to annotate named entities or to relate the named entities to each other. For knowledge discovery (KD) solutions the identified concepts in the scientific literature are used to relate entities to each other, e.g. to identify gene-disease relations based on shared molecular functions.
诸如来自OBO铸造厂的受控词汇表、分类法和本体等本体资源被用于表示生物医学领域知识。开发此类资源是一项耗时的任务。一旦完成,它们有助于信息表示的标准化、IT解决方案的互操作性、文献分析和知识发现。文本挖掘包括用于信息检索(IR)和信息提取(IE)的IT解决方案。IR技术利用本体资源来选择最适合处理的查询的文档,例如,通过用概念ID对文献内容进行索引或通过消除查询中术语的歧义。IE解决方案利用本体标签来识别文本中的概念。表示概念条目的文本段落随后用于注释命名实体或将命名实体相互关联。对于知识发现(KD)解决方案,科学文献中识别出的概念用于将实体相互关联,例如基于共享分子功能识别基因与疾病的关系。