• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用临床事件的多种分布式表示的随机树集成。

Ensembles of randomized trees using diverse distributed representations of clinical events.

作者信息

Henriksson Aron, Zhao Jing, Dalianis Hercules, Boström Henrik

机构信息

Department of Computer and Systems Sciences, Stockholm University, Borgarfjordsgatan 12, Kista, SE-16407, Sweden.

出版信息

BMC Med Inform Decis Mak. 2016 Jul 21;16 Suppl 2(Suppl 2):69. doi: 10.1186/s12911-016-0309-0.

DOI:10.1186/s12911-016-0309-0
PMID:27459846
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4965720/
Abstract

BACKGROUND

Learning deep representations of clinical events based on their distributions in electronic health records has been shown to allow for subsequent training of higher-performing predictive models compared to the use of shallow, count-based representations. The predictive performance may be further improved by utilizing multiple representations of the same events, which can be obtained by, for instance, manipulating the representation learning procedure. The question, however, remains how to make best use of a set of diverse representations of clinical events - modeled in an ensemble of semantic spaces - for the purpose of predictive modeling.

METHODS

Three different ways of exploiting a set of (ten) distributed representations of four types of clinical events - diagnosis codes, drug codes, measurements, and words in clinical notes - are investigated in a series of experiments using ensembles of randomized trees. Here, the semantic space ensembles are obtained by varying the context window size in the representation learning procedure. The proposed method trains a forest wherein each tree is built from a bootstrap replicate of the training set whose entire original feature set is represented in a randomly selected set of semantic spaces - corresponding to the considered data types - of a given context window size.

RESULTS

The proposed method significantly outperforms concatenating the multiple representations of the bagged dataset; it also significantly outperforms representing, for each decision tree, only a subset of the features in a randomly selected set of semantic spaces. A follow-up analysis indicates that the proposed method exhibits less diversity while significantly improving average tree performance. It is also shown that the size of the semantic space ensemble has a significant impact on predictive performance and that performance tends to improve as the size increases.

CONCLUSIONS

The strategy for utilizing a set of diverse distributed representations of clinical events when constructing ensembles of randomized trees has a significant impact on predictive performance. The most successful strategy - significantly outperforming the considered alternatives - involves randomly sampling distributed representations of the clinical events when building each decision tree in the forest.

摘要

背景

与使用基于计数的浅层表示相比,基于临床事件在电子健康记录中的分布来学习其深度表示已被证明能够对更高性能的预测模型进行后续训练。通过利用同一事件的多种表示(例如,通过操纵表示学习过程来获得),预测性能可能会进一步提高。然而,问题仍然在于如何为预测建模的目的,充分利用在语义空间集合中建模的一组不同的临床事件表示。

方法

在一系列使用随机森林的实验中,研究了三种利用四种临床事件(诊断代码、药物代码、测量值和临床笔记中的词汇)的一组(十种)分布式表示的不同方法。在这里,语义空间集合是通过在表示学习过程中改变上下文窗口大小来获得的。所提出的方法训练一个森林,其中每棵树是从训练集的自采样副本构建的,其整个原始特征集在给定上下文窗口大小的一组随机选择的语义空间(对应于所考虑的数据类型)中表示。

结果

所提出的方法显著优于将袋装数据集的多种表示进行拼接;它也显著优于为每个决策树仅在一组随机选择的语义空间中表示特征子集。后续分析表明,所提出的方法在显著提高平均树性能的同时,多样性较低。还表明语义空间集合的大小对预测性能有显著影响,并且性能倾向于随着大小的增加而提高。

结论

在构建随机森林时利用一组不同的临床事件分布式表示的策略对预测性能有显著影响。最成功的策略(显著优于所考虑的替代方法)涉及在构建森林中的每个决策树时随机采样临床事件的分布式表示。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/130f/4965720/d1d9b9a03539/12911_2016_309_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/130f/4965720/96e9a6329a4a/12911_2016_309_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/130f/4965720/d1d9b9a03539/12911_2016_309_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/130f/4965720/96e9a6329a4a/12911_2016_309_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/130f/4965720/d1d9b9a03539/12911_2016_309_Fig2_HTML.jpg

相似文献

1
Ensembles of randomized trees using diverse distributed representations of clinical events.使用临床事件的多种分布式表示的随机树集成。
BMC Med Inform Decis Mak. 2016 Jul 21;16 Suppl 2(Suppl 2):69. doi: 10.1186/s12911-016-0309-0.
2
Predictive modeling of structured electronic health records for adverse drug event detection.用于不良药物事件检测的结构化电子健康记录预测建模
BMC Med Inform Decis Mak. 2015;15 Suppl 4(Suppl 4):S1. doi: 10.1186/1472-6947-15-S4-S1. Epub 2015 Nov 25.
3
Learning temporal weights of clinical events using variable importance.利用变量重要性学习临床事件的时间权重。
BMC Med Inform Decis Mak. 2016 Jul 21;16 Suppl 2(Suppl 2):71. doi: 10.1186/s12911-016-0311-6.
4
Semi-supervised medical entity recognition: A study on Spanish and Swedish clinical corpora.半监督医学实体识别:关于西班牙语和瑞典语临床语料库的研究
J Biomed Inform. 2017 Jul;71:16-30. doi: 10.1016/j.jbi.2017.05.009. Epub 2017 May 16.
5
Identifying adverse drug event information in clinical notes with distributional semantic representations of context.利用上下文的分布语义表示识别临床记录中的药物不良事件信息。
J Biomed Inform. 2015 Oct;57:333-49. doi: 10.1016/j.jbi.2015.08.013. Epub 2015 Aug 17.
6
Learning multiple distributed prototypes of semantic categories for named entity recognition.学习用于命名实体识别的语义类别多个分布式原型。
Int J Data Min Bioinform. 2015;13(4):395-411. doi: 10.1504/ijdmb.2015.072766.
7
Pharmacovigilance from social media: An improved random subspace method for identifying adverse drug events.从社交媒体进行药物警戒:一种改进的随机子空间方法,用于识别药物不良事件。
Int J Med Inform. 2018 Sep;117:33-43. doi: 10.1016/j.ijmedinf.2018.06.008. Epub 2018 Jun 18.
8
Leveraging graph topology and semantic context for pharmacovigilance through twitter-streams.通过推特流利用图拓扑结构和语义上下文进行药物警戒
BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):335. doi: 10.1186/s12859-016-1220-5.
9
Learning predictive models of drug side-effect relationships from distributed representations of literature-derived semantic predications.从文献来源的语义断言的分布式表示中学习药物副作用关系的预测模型。
J Am Med Inform Assoc. 2018 Oct 1;25(10):1339-1350. doi: 10.1093/jamia/ocy077.
10
An ensemble method for extracting adverse drug events from social media.一种从社交媒体中提取药物不良事件的集成方法。
Artif Intell Med. 2016 Jun;70:62-76. doi: 10.1016/j.artmed.2016.05.004. Epub 2016 Jun 6.

引用本文的文献

1
off-target profiling for enhanced drug safety assessment.用于增强药物安全性评估的脱靶分析
Acta Pharm Sin B. 2024 Jul;14(7):2927-2941. doi: 10.1016/j.apsb.2024.03.002. Epub 2024 Mar 6.
2
Missing data in bioarchaeology II: A test of ordinal and continuous data imputation.生物考古学中的缺失数据 II:有序数据和连续数据插补的检验。
Am J Biol Anthropol. 2022 Nov;179(3):349-364. doi: 10.1002/ajpa.24614. Epub 2022 Sep 12.
3
Computational Advances in Drug Safety: Systematic and Mapping Review of Knowledge Engineering Based Approaches.

本文引用的文献

1
Handling Temporality of Clinical Events for Drug Safety Surveillance.应对临床事件的时间性以进行药物安全性监测。
AMIA Annu Symp Proc. 2015 Nov 5;2015:1371-80. eCollection 2015.
2
Learning multiple distributed prototypes of semantic categories for named entity recognition.学习用于命名实体识别的语义类别多个分布式原型。
Int J Data Min Bioinform. 2015;13(4):395-411. doi: 10.1504/ijdmb.2015.072766.
3
Identifying adverse drug event information in clinical notes with distributional semantic representations of context.利用上下文的分布语义表示识别临床记录中的药物不良事件信息。
药物安全性的计算进展:基于知识工程方法的系统综述与图谱综述
Front Pharmacol. 2019 May 17;10:415. doi: 10.3389/fphar.2019.00415. eCollection 2019.
4
EHR phenotyping via jointly embedding medical concepts and words into a unified vector space.通过将医疗概念和词汇联合嵌入到统一的向量空间中进行 EHR 表型分析。
BMC Med Inform Decis Mak. 2018 Dec 12;18(Suppl 4):123. doi: 10.1186/s12911-018-0672-0.
5
Evaluating parameters for ligand-based modeling with random forest on sparse data sets.在稀疏数据集上使用随机森林评估基于配体建模的参数。
J Cheminform. 2018 Oct 11;10(1):49. doi: 10.1186/s13321-018-0304-9.
J Biomed Inform. 2015 Oct;57:333-49. doi: 10.1016/j.jbi.2015.08.013. Epub 2015 Aug 17.
4
Incorporating temporal EHR data in predictive models for risk stratification of renal function deterioration.将时间性电子健康记录数据纳入肾功能恶化风险分层的预测模型中。
J Biomed Inform. 2015 Feb;53:220-8. doi: 10.1016/j.jbi.2014.11.005. Epub 2014 Nov 15.
5
Dose-specific adverse drug reaction identification in electronic patient records: temporal data mining in an inpatient psychiatric population.电子病历中特定剂量药物不良反应的识别:住院精神科人群的时间数据挖掘
Drug Saf. 2014 Apr;37(4):237-47. doi: 10.1007/s40264-014-0145-z.
6
Identifying synonymy between SNOMED clinical terms of varying length using distributional analysis of electronic health records.利用电子健康记录的分布分析识别不同长度的SNOMED临床术语之间的同义关系。
AMIA Annu Symp Proc. 2013 Nov 16;2013:600-9. eCollection 2013.
7
Synonym extraction and abbreviation expansion with ensembles of semantic spaces.使用语义空间集合进行同义词提取和缩写扩展。
J Biomed Semantics. 2014 Feb 5;5(1):6. doi: 10.1186/2041-1480-5-6.
8
Mining electronic health records: towards better research applications and clinical care.挖掘电子健康记录:迈向更好的研究应用和临床护理。
Nat Rev Genet. 2012 May 2;13(6):395-405. doi: 10.1038/nrg3208.
9
Enhancing clinical concept extraction with distributional semantics.利用分布语义增强临床概念提取。
J Biomed Inform. 2012 Feb;45(1):129-40. doi: 10.1016/j.jbi.2011.10.007. Epub 2011 Nov 7.
10
Drug-related admissions and hospital-acquired adverse drug events in Germany: a longitudinal analysis from 2003 to 2007 of ICD-10-coded routine data.德国的药物相关住院和医院获得性药物不良事件:2003 年至 2007 年 ICD-10 编码常规数据的纵向分析。
BMC Health Serv Res. 2011 May 29;11:134. doi: 10.1186/1472-6963-11-134.