Johnson S B, Friedman C
Department of Medical Informatics Columbia University, New York, USA.
Proc AMIA Annu Fall Symp. 1996:537-41.
Demographic data extracted from discharge summaries by natural language processing was compared to data gathered by a conventional hospital admitting system. Discrepancies in data were noted in names, age, sex, race, and ethnicity. Some differences are attributable to errors in collection: interaction with patient, dictation, transcription, and data entry. Very few differences were due to errors in natural language processing. Other differences can be used to critique existing data, or to enhance data with more detailed information. Discrepancies in data as elementary as patient demographics raise the issue of resolving conflicts when neither source of data is known to be more reliable. Clinical repositories can represent conflicting data from multiple sources, but clinical information systems must bear the cost of increased complexity in the application programs that will use the data.
通过自然语言处理从出院小结中提取的人口统计学数据与传统医院入院系统收集的数据进行了比较。在姓名、年龄、性别、种族和族裔方面发现了数据差异。一些差异可归因于收集过程中的错误:与患者的互动、听写、转录和数据录入。由于自然语言处理错误导致的差异非常少。其他差异可用于批评现有数据,或用更详细的信息来增强数据。像患者人口统计学这样基本的数据差异引发了在不知道哪个数据来源更可靠时解决冲突的问题。临床知识库可以呈现来自多个来源的冲突数据,但临床信息系统必须承担使用这些数据的应用程序中增加的复杂性成本。