• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从语义和知识资源中进行表型特征提取。

Feature extraction for phenotyping from semantic and knowledge resources.

机构信息

Department of Industrial Engineering, Tsinghua University, Beijing, China.

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

出版信息

J Biomed Inform. 2019 Mar;91:103122. doi: 10.1016/j.jbi.2019.103122. Epub 2019 Feb 7.

DOI:10.1016/j.jbi.2019.103122
PMID:30738949
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6424621/
Abstract

OBJECTIVE

Phenotyping algorithms can efficiently and accurately identify patients with a specific disease phenotype and construct electronic health records (EHR)-based cohorts for subsequent clinical or genomic studies. Previous studies have introduced unsupervised EHR-based feature selection methods that yielded algorithms with high accuracy. However, those selection methods still require expert intervention to tweak the parameter settings according to the EHR data distribution for each phenotype. To further accelerate the development of phenotyping algorithms, we propose a fully automated and robust unsupervised feature selection method that leverages only publicly available medical knowledge sources, instead of EHR data.

METHODS

SEmantics-Driven Feature Extraction (SEDFE) collects medical concepts from online knowledge sources as candidate features and gives them vector-form distributional semantic representations derived with neural word embedding and the Unified Medical Language System Metathesaurus. A number of features that are semantically closest and that sufficiently characterize the target phenotype are determined by a linear decomposition criterion and are selected for the final classification algorithm.

RESULTS

SEDFE was compared with the EHR-based SAFE algorithm and domain experts on feature selection for the classification of five phenotypes including coronary artery disease, rheumatoid arthritis, Crohn's disease, ulcerative colitis, and pediatric pulmonary arterial hypertension using both supervised and unsupervised approaches. Algorithms yielded by SEDFE achieved comparable accuracy to those yielded by SAFE and expert-curated features. SEDFE is also robust to the input semantic vectors.

CONCLUSION

SEDFE attains satisfying performance in unsupervised feature selection for EHR phenotyping. Both fully automated and EHR-independent, this method promises efficiency and accuracy in developing algorithms for high-throughput phenotyping.

摘要

目的

表型算法可以有效地、准确地识别具有特定疾病表型的患者,并构建基于电子健康记录 (EHR) 的队列,用于后续的临床或基因组研究。先前的研究已经引入了基于无监督的 EHR 的特征选择方法,这些方法产生了具有高精度的算法。然而,这些选择方法仍然需要专家干预,根据每个表型的 EHR 数据分布来调整参数设置。为了进一步加速表型算法的开发,我们提出了一种完全自动化的、稳健的基于无监督的特征选择方法,该方法仅利用公共可用的医学知识库,而不使用 EHR 数据。

方法

语义驱动特征提取 (SEDFE) 从在线知识库中收集医学概念作为候选特征,并使用神经词嵌入和统一医学语言系统元词表为其赋予向量形式的分布语义表示。通过线性分解标准确定与目标表型语义上最接近且足以描述目标表型的多个特征,并将其选择用于最终的分类算法。

结果

SEDFE 分别与基于 EHR 的 SAFE 算法和领域专家在使用有监督和无监督方法对包括冠状动脉疾病、类风湿关节炎、克罗恩病、溃疡性结肠炎和小儿肺动脉高压在内的 5 种表型进行分类的特征选择方面进行了比较。SEDFE 生成的算法的准确性可与 SAFE 生成的算法和专家精心挑选的特征相媲美。SEDFE 对输入语义向量也具有鲁棒性。

结论

SEDFE 在 EHR 表型无监督特征选择中表现出令人满意的性能。它完全自动化且独立于 EHR,有望在开发高通量表型算法方面提高效率和准确性。

相似文献

1
Feature extraction for phenotyping from semantic and knowledge resources.从语义和知识资源中进行表型特征提取。
J Biomed Inform. 2019 Mar;91:103122. doi: 10.1016/j.jbi.2019.103122. Epub 2019 Feb 7.
2
Surrogate-assisted feature extraction for high-throughput phenotyping.用于高通量表型分析的代理辅助特征提取
J Am Med Inform Assoc. 2017 Apr 1;24(e1):e143-e149. doi: 10.1093/jamia/ocw135.
3
Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources.迈向高通量表型分析:从知识源中进行无偏自动特征提取与选择。
J Am Med Inform Assoc. 2015 Sep;22(5):993-1000. doi: 10.1093/jamia/ocv034. Epub 2015 Apr 29.
4
Automated feature selection of predictors in electronic medical records data.电子病历数据中预测指标的自动特征选择
Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2.
5
Scalable relevance ranking algorithm via semantic similarity assessment improves efficiency of medical chart review.通过语义相似性评估的可扩展相关性排序算法提高了医学图表审查的效率。
J Biomed Inform. 2022 Aug;132:104109. doi: 10.1016/j.jbi.2022.104109. Epub 2022 Jun 1.
6
Enabling phenotypic big data with PheNorm.利用 PheNorm 实现表型大数据。
J Am Med Inform Assoc. 2018 Jan 1;25(1):54-60. doi: 10.1093/jamia/ocx111.
7
ARCH: Large-scale Knowledge Graph via Aggregated Narrative Codified Health Records Analysis.ARCH:通过聚合叙事编码健康记录分析构建大规模知识图谱
medRxiv. 2023 May 21:2023.05.14.23289955. doi: 10.1101/2023.05.14.23289955.
8
Weakly Semi-supervised phenotyping using Electronic Health records.基于电子健康记录的弱监督表型研究
J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.
9
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
10
Applying active learning to high-throughput phenotyping algorithms for electronic health records data.将主动学习应用于电子健康记录数据的高通量表型算法。
J Am Med Inform Assoc. 2013 Dec;20(e2):e253-9. doi: 10.1136/amiajnl-2013-001945. Epub 2013 Jul 13.

引用本文的文献

1
A methodological framework for integrating model-guided medicine and multidimensional information management systems: application in anti-aging healthcare.整合模型引导医学与多维信息管理系统的方法框架:在抗衰老医疗保健中的应用
Int J Comput Assist Radiol Surg. 2025 May 15. doi: 10.1007/s11548-025-03337-w.
2
Machine learning approaches for electronic health records phenotyping: a methodical review.基于机器学习的电子健康记录表型分析方法:系统评价
J Am Med Inform Assoc. 2023 Jan 18;30(2):367-381. doi: 10.1093/jamia/ocac216.
3
Artificial Intelligence in Rheumatoid Arthritis: Current Status and Future Perspectives: A State-of-the-Art Review.类风湿关节炎中的人工智能:现状与未来展望:一篇最新综述
Rheumatol Ther. 2022 Oct;9(5):1249-1304. doi: 10.1007/s40744-022-00475-4. Epub 2022 Jul 18.
4
Patient Representation From Structured Electronic Medical Records Based on Embedding Technique: Development and Validation Study.基于嵌入技术的结构化电子病历患者表征:开发与验证研究
JMIR Med Inform. 2021 Jul 23;9(7):e19905. doi: 10.2196/19905.
5
What Every Reader Should Know About Studies Using Electronic Health Record Data but May Be Afraid to Ask.每位读者应该了解的关于使用电子健康记录数据的研究,但可能不敢问的事。
J Med Internet Res. 2021 Mar 2;23(3):e22219. doi: 10.2196/22219.
6
Can antiepileptic efficacy and epilepsy variables be studied from electronic health records? A review of current approaches.电子健康记录能否用于研究抗癫痫药物的疗效和癫痫相关变量?当前方法综述。
Seizure. 2021 Feb;85:138-144. doi: 10.1016/j.seizure.2020.11.011. Epub 2021 Jan 13.
7
Comparative analysis, applications, and interpretation of electronic health record-based stroke phenotyping methods.基于电子健康记录的中风表型分析方法的比较分析、应用及解读
BioData Min. 2020 Dec 7;13(1):21. doi: 10.1186/s13040-020-00230-x.
8
High-throughput phenotyping with temporal sequences.高通量表型分析与时间序列。
J Am Med Inform Assoc. 2021 Mar 18;28(4):772-781. doi: 10.1093/jamia/ocaa288.
9
Generative transfer learning for measuring plausibility of EHR diagnosis records.基于生成式迁移学习的电子病历诊断记录可信度评估
J Am Med Inform Assoc. 2021 Mar 1;28(3):559-568. doi: 10.1093/jamia/ocaa215.
10
Automated ICD coding via unsupervised knowledge integration (UNITE).基于无监督知识集成的 ICD 编码自动化(UNITE)。
Int J Med Inform. 2020 Jul;139:104135. doi: 10.1016/j.ijmedinf.2020.104135. Epub 2020 Apr 4.

本文引用的文献

1
Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data.从海量多模态医学数据中学习的临床概念嵌入。
Pac Symp Biocomput. 2020;25:295-306.
2
Development of an automated phenotyping algorithm for hepatorenal syndrome.开发用于肝肾综合征的自动表型算法。
J Biomed Inform. 2018 Apr;80:87-95. doi: 10.1016/j.jbi.2018.03.001. Epub 2018 Mar 9.
3
Enabling phenotypic big data with PheNorm.利用 PheNorm 实现表型大数据。
J Am Med Inform Assoc. 2018 Jan 1;25(1):54-60. doi: 10.1093/jamia/ocx111.
4
A Computable Phenotype Improves Cohort Ascertainment in a Pediatric Pulmonary Hypertension Registry.一种可计算的表型改善了儿科肺动脉高压登记处的队列确定。
J Pediatr. 2017 Sep;188:224-231.e5. doi: 10.1016/j.jpeds.2017.05.037. Epub 2017 Jun 16.
5
Surrogate-assisted feature extraction for high-throughput phenotyping.用于高通量表型分析的代理辅助特征提取
J Am Med Inform Assoc. 2017 Apr 1;24(e1):e143-e149. doi: 10.1093/jamia/ocw135.
6
Corpus domain effects on distributional semantic modeling of medical terms.语料库领域对医学术语分布语义建模的影响。
Bioinformatics. 2016 Dec 1;32(23):3635-3644. doi: 10.1093/bioinformatics/btw529. Epub 2016 Aug 16.
7
Learning statistical models of phenotypes using noisy labeled training data.使用带有噪声标签的训练数据学习表型的统计模型。
J Am Med Inform Assoc. 2016 Nov;23(6):1166-1173. doi: 10.1093/jamia/ocw028. Epub 2016 May 12.
8
Electronic medical record phenotyping using the anchor and learn framework.使用锚定与学习框架进行电子病历表型分析。
J Am Med Inform Assoc. 2016 Jul;23(4):731-40. doi: 10.1093/jamia/ocw011. Epub 2016 Apr 23.
9
A hierarchical method to automatically encode Chinese diagnoses through semantic similarity estimation.一种通过语义相似性估计自动编码中文诊断的分层方法。
BMC Med Inform Decis Mak. 2016 Mar 3;16:30. doi: 10.1186/s12911-016-0269-4.
10
Speculation detection for Chinese clinical notes: Impacts of word segmentation and embedding models.中文临床笔记中的推测检测:分词和嵌入模型的影响
J Biomed Inform. 2016 Apr;60:334-41. doi: 10.1016/j.jbi.2016.02.011. Epub 2016 Feb 26.