• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用大型临床语料库在基于文本的队列识别中进行查询扩展。

Using large clinical corpora for query expansion in text-based cohort identification.

作者信息

Zhu Dongqing, Wu Stephen, Carterette Ben, Liu Hongfang

机构信息

Department of Computer and Information Sciences, University of Delaware, 440 Smith Hall, Newark, DE 19716, USA.

Division of Biomedical Statistics and Informatics, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA.

出版信息

J Biomed Inform. 2014 Jun;49:275-81. doi: 10.1016/j.jbi.2014.03.010. Epub 2014 Mar 26.

DOI:10.1016/j.jbi.2014.03.010
PMID:24680983
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4058413/
Abstract

In light of the heightened problems of polysemy, synonymy, and hyponymy in clinical text, we hypothesize that patient cohort identification can be improved by using a large, in-domain clinical corpus for query expansion. We evaluate the utility of four auxiliary collections for the Text REtrieval Conference task of IR-based cohort retrieval, considering the effects of collection size, the inherent difficulty of a query, and the interaction between the collections. Each collection was applied to aid in cohort retrieval from the Pittsburgh NLP Repository by using a mixture of relevance models. Measured by mean average precision, performance using any auxiliary resource (MAP=0.386 and above) is shown to improve over the baseline query likelihood model (MAP=0.373). Considering subsets of the Mayo Clinic collection, we found that after including 2.5 billion term instances, retrieval is not improved by adding more instances. However, adding the Mayo Clinic collection did improve performance significantly over any existing setup, with a system using all four auxiliary collections obtaining the best results (MAP=0.4223). Because optimal results in the mixture of relevance models would require selective sampling of the collections, the common sense approach of "use all available data" is inappropriate. However, we found that it was still beneficial to add the Mayo corpus to any mixture of relevance models. On the task of IR-based cohort identification, query expansion with the Mayo Clinic corpus resulted in consistent and significant improvements. As such, any IR query expansion with access to a large clinical corpus could benefit from the additional resource. Additionally, we have shown that more data is not necessarily better, implying that there is value in collection curation.

摘要

鉴于临床文本中存在的一词多义、同义以及上下义关系等突出问题,我们推测通过使用大型的领域内临床语料库进行查询扩展,可以改善患者队列识别。我们评估了四个辅助集合对于基于信息检索的队列检索的文本检索会议任务的效用,考虑了集合大小、查询的固有难度以及各集合之间的相互作用。通过使用相关模型的混合,每个集合都被应用于协助从匹兹堡自然语言处理知识库中进行队列检索。以平均准确率均值衡量,使用任何辅助资源的性能(平均准确率均值 = 0.386及以上)均显示优于基线查询似然模型(平均准确率均值 = 0.373)。考虑梅奥诊所集合的子集,我们发现纳入25亿个词元实例后,增加更多实例并不能提高检索效果。然而,添加梅奥诊所集合确实比任何现有设置都显著提高了性能,使用所有四个辅助集合的系统取得了最佳结果(平均准确率均值 = 0.4223)。因为在相关模型的混合中获得最优结果需要对集合进行选择性采样,所以“使用所有可用数据”这种常识性方法并不合适。然而,我们发现将梅奥语料库添加到任何相关模型的混合中仍然是有益的。在基于信息检索的队列识别任务中,使用梅奥诊所语料库进行查询扩展带来了持续且显著的改进。因此,任何能够访问大型临床语料库的信息检索查询扩展都可以从这一额外资源中受益。此外,我们已经表明,并非数据越多越好,这意味着语料库的整理是有价值的。

相似文献

1
Using large clinical corpora for query expansion in text-based cohort identification.利用大型临床语料库在基于文本的队列识别中进行查询扩展。
J Biomed Inform. 2014 Jun;49:275-81. doi: 10.1016/j.jbi.2014.03.010. Epub 2014 Mar 26.
2
Multi-field query expansion is effective for biomedical dataset retrieval.多字段查询扩展对生物医学数据集检索有效。
Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax062.
3
A Part-Of-Speech term weighting scheme for biomedical information retrieval.一种用于生物医学信息检索的词性术语加权方案。
J Biomed Inform. 2016 Oct;63:379-389. doi: 10.1016/j.jbi.2016.08.026. Epub 2016 Sep 1.
4
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
5
Improving image retrieval effectiveness via query expansion using MeSH hierarchical structure.利用 MeSH 层次结构进行查询扩展以提高图像检索效果。
J Am Med Inform Assoc. 2013 Nov-Dec;20(6):1014-20. doi: 10.1136/amiajnl-2012-000943. Epub 2012 Sep 5.
6
Improving biomedical information retrieval by linear combinations of different query expansion techniques.通过不同查询扩展技术的线性组合改进生物医学信息检索。
BMC Bioinformatics. 2016 Jul 25;17 Suppl 7(Suppl 7):238. doi: 10.1186/s12859-016-1092-8.
7
Factors affecting the effectiveness of biomedical document indexing and retrieval based on terminologies.基于术语的生物医学文献标引和检索有效性的影响因素。
Artif Intell Med. 2013 Feb;57(2):155-67. doi: 10.1016/j.artmed.2012.08.006. Epub 2012 Oct 23.
8
Semantic-Enhanced Query Expansion System for Retrieving Medical Image Notes.用于检索医学图像注释的语义增强查询扩展系统。
J Med Syst. 2018 Apr 25;42(6):105. doi: 10.1007/s10916-018-0954-1.
9
Test collections for electronic health record-based clinical information retrieval.用于基于电子健康记录的临床信息检索的测试集。
JAMIA Open. 2019 Oct;2(3):360-368. doi: 10.1093/jamiaopen/ooz016. Epub 2019 Jun 4.
10
Cluster-based query expansion using external collections in medical information retrieval.医学信息检索中基于聚类并利用外部文集的查询扩展
J Biomed Inform. 2015 Dec;58:70-79. doi: 10.1016/j.jbi.2015.09.017. Epub 2015 Sep 30.

引用本文的文献

1
Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review.将自然语言处理应用于临床数据仓库中的文本数据:系统评价。
JMIR Med Inform. 2023 Dec 15;11:e42477. doi: 10.2196/42477.
2
Evaluation of patient-level retrieval from electronic health record data for a cohort discovery task.针对队列发现任务,评估从电子健康记录数据中进行患者层面检索的情况。
JAMIA Open. 2020 Jul 26;3(3):395-404. doi: 10.1093/jamiaopen/ooaa026. eCollection 2020 Oct.
3
A supervised term ranking model for diversity enhanced biomedical information retrieval.

本文引用的文献

1
Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis.临床记录中统一医学语言系统术语的出现:大规模语料库分析。
J Am Med Inform Assoc. 2012 Jun;19(e1):e149-56. doi: 10.1136/amiajnl-2011-000744. Epub 2012 Apr 4.
2
Semantic characteristics of NLP-extracted concepts in clinical notes vs. biomedical literature.临床笔记与生物医学文献中自然语言处理提取概念的语义特征。
AMIA Annu Symp Proc. 2011;2011:1550-8. Epub 2011 Oct 22.
3
Query log analysis of an electronic health record search engine.电子健康记录搜索引擎的查询日志分析
一种用于增强生物医学信息检索多样性的有监督术语排序模型。
BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):590. doi: 10.1186/s12859-019-3080-2.
4
Extracting similar terms from multiple EMR-based semantic embeddings to support chart reviews.从多个基于 EMR 的语义嵌入中提取相似术语,以支持图表审查。
J Biomed Inform. 2018 Jul;83:63-72. doi: 10.1016/j.jbi.2018.05.014. Epub 2018 May 22.
5
Aligned-Layer Text Search in Clinical Notes.临床笔记中的对齐层文本搜索
Stud Health Technol Inform. 2017;245:629-633.
6
A Part-Of-Speech term weighting scheme for biomedical information retrieval.一种用于生物医学信息检索的词性术语加权方案。
J Biomed Inform. 2016 Oct;63:379-389. doi: 10.1016/j.jbi.2016.08.026. Epub 2016 Sep 1.
7
Interactive Cohort Identification of Sleep Disorder Patients Using Natural Language Processing and i2b2.使用自然语言处理和i2b2对睡眠障碍患者进行交互式队列识别
Appl Clin Inform. 2015 May 27;6(2):345-63. doi: 10.4338/ACI-2014-11-RA-0106. eCollection 2015.
AMIA Annu Symp Proc. 2011;2011:915-24. Epub 2011 Oct 22.
4
A bootstrapping algorithm to improve cohort identification using structured data.一种使用结构化数据改进队列识别的自举算法。
J Biomed Inform. 2011 Dec;44 Suppl 1:S63-S68. doi: 10.1016/j.jbi.2011.10.013. Epub 2011 Nov 7.
5
2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.2010 i2b2/VA 挑战赛:临床文本中的概念、断言和关系
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6. doi: 10.1136/amiajnl-2011-000203. Epub 2011 Jun 16.
6
The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies.eMERGE 网络:一个由生物库组成的联盟,与电子病历数据相关联,用于进行基因组研究。
BMC Med Genomics. 2011 Jan 26;4:13. doi: 10.1186/1755-8794-4-13.
7
Comparative effectiveness research and medical informatics.比较疗效研究与医学信息学。
Am J Med. 2010 Dec;123(12 Suppl 1):e32-7. doi: 10.1016/j.amjmed.2010.10.006.
8
Evaluating the informatics for integrating biology and the bedside system for clinical research.评估用于整合生物学和临床研究床边系统的信息学。
BMC Med Res Methodol. 2009 Oct 28;9:70. doi: 10.1186/1471-2288-9-70.
9
Enhanced identification of eligibility for depression research using an electronic medical record search engine.利用电子病历搜索引擎增强抑郁研究的合格性识别。
Int J Med Inform. 2009 Dec;78(12):e13-8. doi: 10.1016/j.ijmedinf.2009.05.002. Epub 2009 Jun 27.
10
ConText: an algorithm for determining negation, experiencer, and temporal status from clinical reports.语境:一种从临床报告中确定否定、体验者和时间状态的算法。
J Biomed Inform. 2009 Oct;42(5):839-51. doi: 10.1016/j.jbi.2009.05.002. Epub 2009 May 10.