• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用 SVM 重新排序学习识别自身免疫文献中的表型候选物。

Learning to recognize phenotype candidates in the auto-immune literature using SVM re-ranking.

机构信息

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, United Kingdom ; National Institute of Informatics, Tokyo, Japan.

出版信息

PLoS One. 2013 Oct 14;8(10):e72965. doi: 10.1371/journal.pone.0072965. eCollection 2013.

DOI:10.1371/journal.pone.0072965
PMID:24155869
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3796529/
Abstract

The identification of phenotype descriptions in the scientific literature, case reports and patient records is a rewarding task for bio-medical text mining. Any progress will support knowledge discovery and linkage to other resources. However because of their wide variation a number of challenges still remain in terms of their identification and semantic normalisation before they can be fully exploited for research purposes. This paper presents novel techniques for identifying potential complex phenotype mentions by exploiting a hybrid model based on machine learning, rules and dictionary matching. A systematic study is made of how to combine sequence labels from these modules as well as the merits of various ontological resources. We evaluated our approach on a subset of Medline abstracts cited by the Online Mendelian Inheritance of Man database related to auto-immune diseases. Using partial matching the best micro-averaged F-score for phenotypes and five other entity classes was 79.9%. A best performance of 75.3% was achieved for phenotype candidates using all semantics resources. We observed the advantage of using SVM-based learn-to-rank for sequence label combination over maximum entropy and a priority list approach. The results indicate that the identification of simple entity types such as chemicals and genes are robustly supported by single semantic resources, whereas phenotypes require combinations. Altogether we conclude that our approach coped well with the compositional structure of phenotypes in the auto-immune domain.

摘要

在科学文献、病例报告和患者记录中识别表型描述对于生物医学文本挖掘来说是一项很有价值的任务。任何进展都将支持知识发现,并与其他资源建立联系。然而,由于它们的广泛变化,在将其充分用于研究目的之前,在识别和语义规范化方面仍然存在一些挑战。本文提出了一种新的技术,通过利用基于机器学习、规则和字典匹配的混合模型来识别潜在的复杂表型提及。系统地研究了如何结合这些模块的序列标签,以及各种本体资源的优点。我们在与自身免疫性疾病相关的在线孟德尔遗传数据库引用的 Medline 摘要的一个子集中评估了我们的方法。使用部分匹配,表型和其他五个实体类的最佳微平均 F 分数为 79.9%。使用所有语义资源,对表型候选者的最佳性能为 75.3%。我们观察到使用基于 SVM 的学习排序进行序列标签组合优于最大熵和优先级列表方法的优势。结果表明,单一语义资源能够很好地支持简单实体类型(如化学物质和基因)的识别,而表型则需要组合。总的来说,我们得出的结论是,我们的方法很好地处理了自身免疫领域中表型的组合结构。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/3796529/a233d0940334/pone.0072965.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/3796529/3013f1c5eb4c/pone.0072965.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/3796529/63ca368c23a2/pone.0072965.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/3796529/ce6aa94a54bc/pone.0072965.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/3796529/608e00133a92/pone.0072965.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/3796529/a233d0940334/pone.0072965.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/3796529/3013f1c5eb4c/pone.0072965.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/3796529/63ca368c23a2/pone.0072965.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/3796529/ce6aa94a54bc/pone.0072965.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/3796529/608e00133a92/pone.0072965.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/3796529/a233d0940334/pone.0072965.g008.jpg

相似文献

1
Learning to recognize phenotype candidates in the auto-immune literature using SVM re-ranking.利用 SVM 重新排序学习识别自身免疫文献中的表型候选物。
PLoS One. 2013 Oct 14;8(10):e72965. doi: 10.1371/journal.pone.0072965. eCollection 2013.
2
Enhancing clinical concept extraction with distributional semantics.利用分布语义增强临床概念提取。
J Biomed Inform. 2012 Feb;45(1):129-40. doi: 10.1016/j.jbi.2011.10.007. Epub 2011 Nov 7.
3
Discovering novel protein-protein interactions by measuring the protein semantic similarity from the biomedical literature.通过测量生物医学文献中的蛋白质语义相似性来发现新的蛋白质-蛋白质相互作用。
J Bioinform Comput Biol. 2014 Dec;12(6):1442008. doi: 10.1142/S0219720014420086.
4
NCBI disease corpus: a resource for disease name recognition and concept normalization.NCBI疾病语料库:一种用于疾病名称识别和概念规范化的资源。
J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3.
5
Knowledge based word-concept model estimation and refinement for biomedical text mining.用于生物医学文本挖掘的基于知识的词概念模型估计与优化。
J Biomed Inform. 2015 Feb;53:300-7. doi: 10.1016/j.jbi.2014.11.015. Epub 2014 Dec 12.
6
Linking entities through an ontology using word embeddings and syntactic re-ranking.通过使用词向量和句法重新排序将实体链接到本体中。
BMC Bioinformatics. 2019 Mar 27;20(1):156. doi: 10.1186/s12859-019-2678-8.
7
Recognising discourse causality triggers in the biomedical domain.识别生物医学领域中的语篇因果关系触发因素。
J Bioinform Comput Biol. 2013 Dec;11(6):1343008. doi: 10.1142/S0219720013430087. Epub 2013 Dec 2.
8
A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.基于机器学习的方法从出院小结中提取临床实体及其断言的研究。
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):601-6. doi: 10.1136/amiajnl-2011-000163. Epub 2011 Apr 20.
9
A classification approach to coreference in discharge summaries: 2011 i2b2 challenge.一种用于出院小结中核心参照的分类方法:2011 i2b2 挑战赛。
J Am Med Inform Assoc. 2012 Sep-Oct;19(5):897-905. doi: 10.1136/amiajnl-2011-000734. Epub 2012 Apr 13.
10
HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology.HPO2Vec+:利用异构知识资源丰富人类表型本体的节点嵌入。
J Biomed Inform. 2019 Aug;96:103246. doi: 10.1016/j.jbi.2019.103246. Epub 2019 Jun 27.

引用本文的文献

1
Annotating and detecting phenotypic information for chronic obstructive pulmonary disease.标注与检测慢性阻塞性肺疾病的表型信息。
JAMIA Open. 2019 Apr 26;2(2):261-271. doi: 10.1093/jamiaopen/ooz009. eCollection 2019 Jul.
2
Gold-standard ontology-based anatomical annotation in the CRAFT Corpus.CRAFT语料库中基于金标准本体的解剖学标注
Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax087.
3
PhenoMiner: from text to a database of phenotypes associated with OMIM diseases.PhenoMiner:从文本到与《在线人类孟德尔遗传》疾病相关的表型数据库

本文引用的文献

1
PhenoDigm: analyzing curated annotations to associate animal models with human diseases.PhenoDigm:分析经过整理的注释,将动物模型与人类疾病联系起来。
Database (Oxford). 2013 May 9;2013:bat025. doi: 10.1093/database/bat025. Print 2013.
2
Gimli: open source and high-performance biomedical name recognition.金雳:开源的高性能生物医学命名实体识别。
BMC Bioinformatics. 2013 Feb 15;14:54. doi: 10.1186/1471-2105-14-54.
3
EpiDEA: extracting structured epilepsy and seizure information from patient discharge summaries for cohort identification.
Database (Oxford). 2015 Oct 27;2015. doi: 10.1093/database/bav104. Print 2015.
4
The digital revolution in phenotyping.表型分析中的数字革命。
Brief Bioinform. 2016 Sep;17(5):819-30. doi: 10.1093/bib/bbv083. Epub 2015 Sep 29.
5
Concept selection for phenotypes and diseases using learn to rank.使用排序学习法进行表型和疾病的概念选择。
J Biomed Semantics. 2015 Jun 1;6:24. doi: 10.1186/s13326-015-0019-z. eCollection 2015.
6
Automatic concept recognition using the human phenotype ontology reference and test suite corpora.使用人类表型本体参考和测试套件语料库进行自动概念识别。
Database (Oxford). 2015 Feb 27;2015. doi: 10.1093/database/bav005. Print 2015.
EpiDEA:从患者出院小结中提取结构化癫痫和发作信息以进行队列识别。
AMIA Annu Symp Proc. 2012;2012:1191-200. Epub 2012 Nov 3.
4
Supervised segmentation of phenotype descriptions for the human skeletal phenome using hybrid methods.使用混合方法对人类骨骼表型进行表型描述的监督分割。
BMC Bioinformatics. 2012 Oct 15;13:265. doi: 10.1186/1471-2105-13-265.
5
A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools.语料库全文期刊文章是一种强大的评估工具,可用于揭示生物医学自然语言处理工具性能的差异。
BMC Bioinformatics. 2012 Aug 17;13:207. doi: 10.1186/1471-2105-13-207.
6
Recognition of medication information from discharge summaries using ensembles of classifiers.使用分类器集成识别出院小结中的药物信息。
BMC Med Inform Decis Mak. 2012 May 7;12:36. doi: 10.1186/1472-6947-12-36.
7
Semantic integration of physiology phenotypes with an application to the Cellular Phenotype Ontology.生理学表型的语义集成及其在细胞表型本体中的应用。
Bioinformatics. 2012 Jul 1;28(13):1783-9. doi: 10.1093/bioinformatics/bts250. Epub 2012 Apr 26.
8
Text mining for the biocuration workflow.文本挖掘在生物注释工作流中的应用。
Database (Oxford). 2012 Apr 18;2012:bas020. doi: 10.1093/database/bas020. Print 2012.
9
Assessment of NER solutions against the first and second CALBC Silver Standard Corpus.针对首个和第二个CALBC银标准语料库对命名实体识别解决方案进行评估。
J Biomed Semantics. 2011 Oct 6;2 Suppl 5(Suppl 5):S11. doi: 10.1186/2041-1480-2-S5-S11.
10
PhenomeNET: a whole-phenome approach to disease gene discovery.表型网络(PhenomeNET):一种全表型方法用于疾病基因发现。
Nucleic Acids Res. 2011 Oct;39(18):e119. doi: 10.1093/nar/gkr538. Epub 2011 Jul 6.