Friedman Carol, Kra Pauline, Rzhetsky Andrey
Department of Medical Informatics, Columbia University, VC5, Vanderbilt Building, 622 West 168th Street, New York, NY 10032-3720, USA.
J Biomed Inform. 2002 Aug;35(4):222-35. doi: 10.1016/s1532-0464(03)00012-1.
Natural language processing (NLP) systems have been developed to provide access to the tremendous body of data and knowledge that is available in the biomedical domain in the form of natural language text. These NLP systems are valuable because they can encode and amass the information in the text so that it can be used by other automated processes to improve patient care and our understanding of disease processes and treatments. Zellig Harris proposed a theory of sublanguage that laid the foundation for natural language processing in specialized domains. He hypothesized that the informational content and structure form a specialized language that can be delineated in the form of a sublanguage grammar. The grammar can then be used by a language processor to capture and encode the salient information and relations in text. In this paper, we briefly summarize his language and sublanguage theories. In addition, we summarize our prior research, which is associated with the sublanguage grammars we developed for two different biomedical domains. These grammars illustrate how Harris' theories provide a basis for the development of language processing systems in the biomedical domain. The two domains and their associated sublanguages discussed are: the clinical domain, where the text consists of patient reports, and the biomolecular domain, where the text consists of complete journal articles.
自然语言处理(NLP)系统已被开发出来,以便能够访问生物医学领域中以自然语言文本形式存在的海量数据和知识。这些NLP系统很有价值,因为它们可以对文本中的信息进行编码和积累,以便其他自动化流程能够利用这些信息来改善患者护理,并增进我们对疾病过程和治疗方法的理解。泽利格·哈里斯提出了一种子语言理论,为专业领域的自然语言处理奠定了基础。他假设信息内容和结构形成一种特殊语言,可以用子语言语法的形式来描述。然后,语言处理器可以使用该语法来捕捉和编码文本中的重要信息及关系。在本文中,我们简要总结他的语言和子语言理论。此外,我们还总结了我们之前的研究,这些研究与我们为两个不同生物医学领域开发的子语言语法相关。这些语法说明了哈里斯的理论如何为生物医学领域语言处理系统的开发提供基础。所讨论的两个领域及其相关子语言分别是:临床领域,其文本由患者报告组成;生物分子领域,其文本由完整的期刊文章组成。
J Biomed Inform. 2002-8
J Biomed Inform. 2002-8
J Biomed Inform. 2002-8
J Biomed Inform. 2002-8
BMC Bioinformatics. 2006-11-24
Bioinformatics. 2005-6-1
Bioinformatics. 2003-9-1
Drug Discov Today. 2006-4
AMIA Annu Symp Proc. 2025-5-22
Comput Struct Biotechnol J. 2023-8-22
J Am Med Inform Assoc. 2023-11-17
Sci Data. 2023-1-4
J Am Med Inform Assoc. 2022-10-7
Bioinformatics. 2022-8-2