Suppr超能文献

两种生物医学子语言:基于泽利格·哈里斯理论的一种描述

Two biomedical sublanguages: a description based on the theories of Zellig Harris.

作者信息

Friedman Carol, Kra Pauline, Rzhetsky Andrey

机构信息

Department of Medical Informatics, Columbia University, VC5, Vanderbilt Building, 622 West 168th Street, New York, NY 10032-3720, USA.

出版信息

J Biomed Inform. 2002 Aug;35(4):222-35. doi: 10.1016/s1532-0464(03)00012-1.

Abstract

Natural language processing (NLP) systems have been developed to provide access to the tremendous body of data and knowledge that is available in the biomedical domain in the form of natural language text. These NLP systems are valuable because they can encode and amass the information in the text so that it can be used by other automated processes to improve patient care and our understanding of disease processes and treatments. Zellig Harris proposed a theory of sublanguage that laid the foundation for natural language processing in specialized domains. He hypothesized that the informational content and structure form a specialized language that can be delineated in the form of a sublanguage grammar. The grammar can then be used by a language processor to capture and encode the salient information and relations in text. In this paper, we briefly summarize his language and sublanguage theories. In addition, we summarize our prior research, which is associated with the sublanguage grammars we developed for two different biomedical domains. These grammars illustrate how Harris' theories provide a basis for the development of language processing systems in the biomedical domain. The two domains and their associated sublanguages discussed are: the clinical domain, where the text consists of patient reports, and the biomolecular domain, where the text consists of complete journal articles.

摘要

自然语言处理(NLP)系统已被开发出来,以便能够访问生物医学领域中以自然语言文本形式存在的海量数据和知识。这些NLP系统很有价值,因为它们可以对文本中的信息进行编码和积累,以便其他自动化流程能够利用这些信息来改善患者护理,并增进我们对疾病过程和治疗方法的理解。泽利格·哈里斯提出了一种子语言理论,为专业领域的自然语言处理奠定了基础。他假设信息内容和结构形成一种特殊语言,可以用子语言语法的形式来描述。然后,语言处理器可以使用该语法来捕捉和编码文本中的重要信息及关系。在本文中,我们简要总结他的语言和子语言理论。此外,我们还总结了我们之前的研究,这些研究与我们为两个不同生物医学领域开发的子语言语法相关。这些语法说明了哈里斯的理论如何为生物医学领域语言处理系统的开发提供基础。所讨论的两个领域及其相关子语言分别是:临床领域,其文本由患者报告组成;生物分子领域,其文本由完整的期刊文章组成。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验