• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
A reliability study for evaluating information extraction from radiology reports.一项用于评估从放射学报告中提取信息的可靠性研究。
J Am Med Inform Assoc. 1999 Mar-Apr;6(2):143-50. doi: 10.1136/jamia.1999.0060143.
2
Impact of clinical history on chest radiograph interpretation.临床病史对胸部 X 线片解读的影响。
J Hosp Med. 2013 Jul;8(7):359-64. doi: 10.1002/jhm.1991. Epub 2012 Nov 26.
3
Automatic detection of acute bacterial pneumonia from chest X-ray reports.从胸部X光报告中自动检测急性细菌性肺炎。
J Am Med Inform Assoc. 2000 Nov-Dec;7(6):593-604. doi: 10.1136/jamia.2000.0070593.
4
Wisconsin cystic fibrosis chest radiograph scoring system: validation and standardization for application to longitudinal studies.威斯康星囊性纤维化胸部X线评分系统:用于纵向研究的验证与标准化
Pediatr Pulmonol. 2000 Jun;29(6):457-67. doi: 10.1002/(sici)1099-0496(200006)29:6<457::aid-ppul8>3.0.co;2-9.
5
Stulberg classification system for evaluation of Legg-Calvé-Perthes disease: intra-rater and inter-rater reliability.用于评估Legg-Calvé-Perthes病的Stulberg分类系统:评分者内和评分者间的可靠性。
J Bone Joint Surg Am. 1999 Sep;81(9):1209-16. doi: 10.2106/00004623-199909000-00002.
6
Inter-Rater Reliability and Intra-Rater Reliability of Assessing the 2-Minute Push-Up Test.评估两分钟俯卧撑测试的评分者间信度和评分者内信度。
Mil Med. 2016 Feb;181(2):167-72. doi: 10.7205/MILMED-D-14-00533.
7
Variability in the interpretation of chest radiographs for the diagnosis of pneumonia in children.儿童肺炎胸部 X 线片诊断的解读差异。
J Hosp Med. 2012 Apr;7(4):294-8. doi: 10.1002/jhm.955. Epub 2011 Oct 18.
8
Scientific basis of the OCRA method for risk assessment of biomechanical overload of upper limb, as preferred method in ISO standards on biomechanical risk factors.OCRA 方法评估上肢生物力学过载风险的科学基础,作为 ISO 生物力学风险因素标准中的首选方法。
Scand J Work Environ Health. 2018 Jul 1;44(4):436-438. doi: 10.5271/sjweh.3746.
9
The intra- and inter-rater reliability of the tragus wall distance (TWD) measurement in non-pathological participants ages 18-34.18-34 岁非病理性参与者中外耳屏距离(TWD)测量的组内和组间信度。
Physiother Theory Pract. 2013 May;29(4):328-34. doi: 10.3109/09593985.2012.727528. Epub 2012 Oct 8.
10
Comparing expert systems for identifying chest x-ray reports that support pneumonia.比较用于识别支持肺炎诊断的胸部X光报告的专家系统。
Proc AMIA Symp. 1999:216-20.

引用本文的文献

1
The Evolution of Radiology Image Annotation in the Era of Large Language Models.大语言模型时代放射学图像标注的演变
Radiol Artif Intell. 2025 Jul;7(4):e240631. doi: 10.1148/ryai.240631.
2
Clinical report classification using Natural Language Processing and Topic Modeling.使用自然语言处理和主题建模的临床报告分类
Proc Int Conf Mach Learn Appl. 2012 Dec;2012:204-209. doi: 10.1109/icmla.2012.173. Epub 2013 Jan 10.
3
Measuring Performance on the ABCDEF Bundle During Interprofessional Rounds via a Nurse-Based Assessment Tool.使用基于护士的评估工具衡量跨专业团队查房中 ABCDEF 集束的执行情况。
Am J Crit Care. 2023 Mar 1;32(2):92-99. doi: 10.4037/ajcc2023755.
4
Unsupervised Topic Modeling in a Large Free Text Radiology Report Repository.大型自由文本放射学报告库中的无监督主题建模
J Digit Imaging. 2016 Feb;29(1):59-62. doi: 10.1007/s10278-015-9823-3.
5
Facilitating surveillance of pulmonary invasive mold diseases in patients with haematological malignancies by screening computed tomography reports using natural language processing.利用自然语言处理技术筛查计算机断层扫描报告,以促进对血液系统恶性肿瘤患者肺部侵袭性霉菌病的监测。
PLoS One. 2014 Sep 24;9(9):e107797. doi: 10.1371/journal.pone.0107797. eCollection 2014.
6
A mathematical framework for combining decisions of multiple experts toward accurate and remote diagnosis of malaria using tele-microscopy.利用远程显微镜对疟疾进行准确和远程诊断的多专家决策组合的数学框架。
PLoS One. 2012;7(10):e46192. doi: 10.1371/journal.pone.0046192. Epub 2012 Oct 11.
7
Data from clinical notes: a perspective on the tension between structure and flexible documentation.临床笔记数据:结构与灵活记录之间的紧张关系之观点。
J Am Med Inform Assoc. 2011 Mar-Apr;18(2):181-6. doi: 10.1136/jamia.2010.007237. Epub 2011 Jan 12.
8
Automated evaluation of electronic discharge notes to assess quality of care for cardiovascular diseases using Medical Language Extraction and Encoding System (MedLEE).利用医学语言提取和编码系统(MedLEE)自动评估电子出院记录,以评估心血管疾病的护理质量。
J Am Med Inform Assoc. 2010 May-Jun;17(3):245-52. doi: 10.1136/jamia.2009.000182.
9
Reliability of zygapophysial joint space measurements made from magnetic resonance imaging scans of acute low back pain subjects: comparison of 2 statistical methods.急性下背痛患者磁共振成像扫描中关节突关节间隙测量的可靠性:两种统计方法的比较
J Manipulative Physiol Ther. 2010 Mar-Apr;33(3):220-5. doi: 10.1016/j.jmpt.2010.01.009.
10
Evaluation of a UMLS Auditing Process of Semantic Type Assignments.统一医学语言系统语义类型分配审核流程的评估
AMIA Annu Symp Proc. 2007 Oct 11;2007:294-8.

本文引用的文献

1
An evaluation of natural language processing methodologies.自然语言处理方法的评估。
Proc AMIA Symp. 1998:855-9.
2
Knowledge discovery and data mining to assist natural language understanding.知识发现与数据挖掘助力自然语言理解。
Proc AMIA Symp. 1998:835-9.
3
Extracting findings from narrative reports: software transferability and sources of physician disagreement.从叙述性报告中提取结果:软件可转移性与医生意见分歧的根源
Methods Inf Med. 1998 Jan;37(1):1-7.
4
Development and initial validation of an instrument to measure physicians' use of, knowledge about, and attitudes toward computers.一种用于测量医生对计算机的使用情况、知识掌握程度及态度的工具的开发与初步验证。
J Am Med Inform Assoc. 1998 Mar-Apr;5(2):164-76. doi: 10.1136/jamia.1998.0050164.
5
Respiratory isolation of tuberculosis patients using clinical guidelines and an automated clinical decision support system.使用临床指南和自动化临床决策支持系统对肺结核患者进行呼吸道隔离。
Infect Control Hosp Epidemiol. 1998 Feb;19(2):94-100. doi: 10.1086/647773.
6
An experiment comparing lexical and statistical methods for extracting MeSH terms from clinical free text.一项比较从临床自由文本中提取医学主题词的词汇法和统计法的实验。
J Am Med Inform Assoc. 1998 Jan-Feb;5(1):62-75. doi: 10.1136/jamia.1998.0050062.
7
Identification of suspected tuberculosis patients based on natural language processing of chest radiograph reports.基于胸部X光片报告的自然语言处理识别疑似肺结核患者
Proc AMIA Annu Fall Symp. 1996:542-6.
8
Performance of four computer-based diagnostic systems.四种基于计算机的诊断系统的性能。
N Engl J Med. 1994 Jun 23;330(25):1792-6. doi: 10.1056/NEJM199406233302506.
9
Incidence of adverse drug events and potential adverse drug events. Implications for prevention. ADE Prevention Study Group.药物不良事件和潜在药物不良事件的发生率。对预防的启示。药物不良事件预防研究组
JAMA. 1995 Jul 5;274(1):29-34.
10
Unlocking clinical data from narrative reports: a study of natural language processing.从叙述性报告中解锁临床数据:一项自然语言处理研究
Ann Intern Med. 1995 May 1;122(9):681-8. doi: 10.7326/0003-4819-122-9-199505010-00007.

一项用于评估从放射学报告中提取信息的可靠性研究。

A reliability study for evaluating information extraction from radiology reports.

作者信息

Hripcsak G, Kuperman G J, Friedman C, Heitjan D F

机构信息

Columbia University, New York, New York, USA.

出版信息

J Am Med Inform Assoc. 1999 Mar-Apr;6(2):143-50. doi: 10.1136/jamia.1999.0060143.

DOI:10.1136/jamia.1999.0060143
PMID:10094067
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC61353/
Abstract

GOAL

To assess the reliability of a reference standard for an information extraction task.

SETTING

Twenty-four physician raters from two sites and two specialties judged whether clinical conditions were present based on reading chest radiograph reports.

METHODS

Variance components, generalizability (reliability) coefficients, and the number of expert raters needed to generate a reliable reference standard were estimated.

RESULTS

Per-rater reliability averaged across conditions was 0.80 (95% CI, 0.79-0.81). Reliability for the nine individual conditions varied from 0.67 to 0.97, with central line presence and pneumothorax the most reliable, and pleural effusion (excluding CHF) and pneumonia the least reliable. One to two raters were needed to achieve a reliability of 0.70, and six raters, on average, were required to achieve a reliability of 0.95. This was far more reliable than a previously published per-rater reliability of 0.19 for a more complex task. Differences between sites were attributable to changes to the condition definitions.

CONCLUSION

In these evaluations, physician raters were able to judge very reliably the presence of clinical conditions based on text reports. Once the reliability of a specific rater is confirmed, it would be possible for that rater to create a reference standard reliable enough to assess aggregate measures on a system. Six raters would be needed to create a reference standard sufficient to assess a system on a case-by-case basis. These results should help evaluators design future information extraction studies for natural language processors and other knowledge-based systems.

摘要

目标

评估信息提取任务中参考标准的可靠性。

背景

来自两个地点和两个专业的24名医生评分者根据阅读胸部X光片报告判断是否存在临床病症。

方法

估计方差成分、泛化(可靠性)系数以及生成可靠参考标准所需的专家评分者数量。

结果

各条件下评分者的平均可靠性为0.80(95%可信区间,0.79 - 0.81)。九种个体病症的可靠性从0.67到0.97不等,中心静脉置管存在和气胸最为可靠,胸腔积液(不包括心力衰竭)和肺炎最不可靠。需要一到两名评分者才能达到0.70的可靠性,平均需要六名评分者才能达到0.95的可靠性。这比之前发表的一项更复杂任务中评分者的可靠性0.19要可靠得多。不同地点之间的差异归因于病症定义的变化。

结论

在这些评估中,医生评分者能够根据文本报告非常可靠地判断临床病症的存在。一旦确认了特定评分者的可靠性,该评分者就有可能创建一个足够可靠的参考标准,以评估系统的总体指标。需要六名评分者来创建一个足以逐案评估系统的参考标准。这些结果应有助于评估者为自然语言处理器和其他基于知识的系统设计未来的信息提取研究。