两种生物医学子语言：基于泽利格·哈里斯理论的一种描述

Two biomedical sublanguages: a description based on the theories of Zellig Harris.

作者信息

Friedman Carol, Kra Pauline, Rzhetsky Andrey

机构信息

Department of Medical Informatics, Columbia University, VC5, Vanderbilt Building, 622 West 168th Street, New York, NY 10032-3720, USA.

出版信息

J Biomed Inform. 2002 Aug;35(4):222-35. doi: 10.1016/s1532-0464(03)00012-1.

DOI:10.1016/s1532-0464(03)00012-1

PMID:12755517

Abstract

Natural language processing (NLP) systems have been developed to provide access to the tremendous body of data and knowledge that is available in the biomedical domain in the form of natural language text. These NLP systems are valuable because they can encode and amass the information in the text so that it can be used by other automated processes to improve patient care and our understanding of disease processes and treatments. Zellig Harris proposed a theory of sublanguage that laid the foundation for natural language processing in specialized domains. He hypothesized that the informational content and structure form a specialized language that can be delineated in the form of a sublanguage grammar. The grammar can then be used by a language processor to capture and encode the salient information and relations in text. In this paper, we briefly summarize his language and sublanguage theories. In addition, we summarize our prior research, which is associated with the sublanguage grammars we developed for two different biomedical domains. These grammars illustrate how Harris' theories provide a basis for the development of language processing systems in the biomedical domain. The two domains and their associated sublanguages discussed are: the clinical domain, where the text consists of patient reports, and the biomolecular domain, where the text consists of complete journal articles.

摘要

自然语言处理（NLP）系统已被开发出来，以便能够访问生物医学领域中以自然语言文本形式存在的海量数据和知识。这些NLP系统很有价值，因为它们可以对文本中的信息进行编码和积累，以便其他自动化流程能够利用这些信息来改善患者护理，并增进我们对疾病过程和治疗方法的理解。泽利格·哈里斯提出了一种子语言理论，为专业领域的自然语言处理奠定了基础。他假设信息内容和结构形成一种特殊语言，可以用子语言语法的形式来描述。然后，语言处理器可以使用该语法来捕捉和编码文本中的重要信息及关系。在本文中，我们简要总结他的语言和子语言理论。此外，我们还总结了我们之前的研究，这些研究与我们为两个不同生物医学领域开发的子语言语法相关。这些语法说明了哈里斯的理论如何为生物医学领域语言处理系统的开发提供基础。所讨论的两个领域及其相关子语言分别是：临床领域，其文本由患者报告组成；生物分子领域，其文本由完整的期刊文章组成。

相似文献

Two biomedical sublanguages: a description based on the theories of Zellig Harris.

J Biomed Inform. 2002 Aug;35(4):222-35. doi: 10.1016/s1532-0464(03)00012-1.

Paraphrasing for condensation in journal abstracting.

J Biomed Inform. 2002 Aug;35(4):265-77. doi: 10.1016/s1532-0464(03)00016-9.

The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text.

J Biomed Inform. 2003 Dec;36(6):462-77. doi: 10.1016/j.jbi.2003.11.003.

Information extraction from biomedical text.

J Biomed Inform. 2002 Aug;35(4):260-4. doi: 10.1016/s1532-0464(03)00015-7.

The structure of science information.

J Biomed Inform. 2002 Aug;35(4):215-21. doi: 10.1016/s1532-0464(03)00011-x.

Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches.

BMC Bioinformatics. 2006 Nov 24;7 Suppl 3(Suppl 3):S2. doi: 10.1186/1471-2105-7-S3-S2.

MaSTerClass: a case-based reasoning system for the classification of biomedical terms.

Bioinformatics. 2005 Jun 1;21(11):2748-58. doi: 10.1093/bioinformatics/bti338. Epub 2005 Feb 22.

MedScan, a natural language processing engine for MEDLINE abstracts.

Bioinformatics. 2003 Sep 1;19(13):1699-706. doi: 10.1093/bioinformatics/btg207.

Status of text-mining techniques applied to biomedical text.

Drug Discov Today. 2006 Apr;11(7-8):315-25. doi: 10.1016/j.drudis.2006.02.011.

Evaluation of Meta-1 for a concept-based approach to the automated indexing and retrieval of bibliographic and full-text databases.

Med Decis Making. 1991 Oct-Dec;11(4 Suppl):S120-4.

引用本文的文献

Toward Relieving Clinician Burden by Automatically Generating Progress Notes using Interim Hospital Data.

AMIA Annu Symp Proc. 2025 May 22;2024:1059-1068. eCollection 2024.

Clinical document corpora-real ones, translated and synthetic substitutes, and assorted domain proxies: a survey of diversity in corpus design, with focus on German text data.

JAMIA Open. 2025 May 14;8(3):ooaf024. doi: 10.1093/jamiaopen/ooaf024. eCollection 2025 Jun.

Coherence and comprehensibility: Large language models predict lay understanding of health-related content.

J Biomed Inform. 2025 Jan;161:104758. doi: 10.1016/j.jbi.2024.104758. Epub 2024 Dec 9.

A cross-institutional evaluation on breast cancer phenotyping NLP algorithms on electronic health records.

Comput Struct Biotechnol J. 2023 Aug 22;22:32-40. doi: 10.1016/j.csbj.2023.08.018. eCollection 2023.

The added value of text from Dutch general practitioner notes in predictive modeling.

J Am Med Inform Assoc. 2023 Nov 17;30(12):1973-1984. doi: 10.1093/jamia/ocad160.

Causal feature selection using a knowledge graph combining structured knowledge from the biomedical literature and ontologies: A use case studying depression as a risk factor for Alzheimer's disease.

J Biomed Inform. 2023 Jun;142:104368. doi: 10.1016/j.jbi.2023.104368. Epub 2023 Apr 21.

Design considerations for a hierarchical semantic compositional framework for medical natural language understanding.

PLoS One. 2023 Mar 16;18(3):e0282882. doi: 10.1371/journal.pone.0282882. eCollection 2023.

A dataset for plain language adaptation of biomedical abstracts.

Sci Data. 2023 Jan 4;10(1):8. doi: 10.1038/s41597-022-01920-3.

A survey of automated methods for biomedical text simplification.

J Am Med Inform Assoc. 2022 Oct 7;29(11):1976-1988. doi: 10.1093/jamia/ocac149.

Sequence tagging for biomedical extractive question answering.

Bioinformatics. 2022 Aug 2;38(15):3794-3801. doi: 10.1093/bioinformatics/btac397.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

两种生物医学子语言：基于泽利格·哈里斯理论的一种描述

Two biomedical sublanguages: a description based on the theories of Zellig Harris.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献