• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用共现网络结构从MEDLINE摘要中提取同义基因和蛋白质名称。

Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts.

作者信息

Cohen A M, Hersh W R, Dubay C, Spackman K

机构信息

Department of Medical Informatics and Clinical Epidemiology, School of Medicine, Oregon Health & Science University, 3181 S,W, Sam Jackson Park Road, Portland, Oregon 97239-3098, USA.

出版信息

BMC Bioinformatics. 2005 Apr 22;6:103. doi: 10.1186/1471-2105-6-103.

DOI:10.1186/1471-2105-6-103
PMID:15847682
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1090552/
Abstract

BACKGROUND

Text-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by symbol co-occurrences as a way to extend the capabilities of knowledge extraction. The method was applied to the task of automatic gene and protein name synonym extraction.

RESULTS

Performance was measured on a test set consisting of about 50,000 abstracts from one year of MEDLINE. Synonyms retrieved from curated genomics databases were used as a gold standard. The system obtained a maximum F-score of 22.21% (23.18% precision and 21.36% recall), with high efficiency in the use of seed pairs.

CONCLUSION

The method performs comparably with other studied methods, does not rely on sophisticated named-entity recognition, and requires little initial seed knowledge.

摘要

背景

文本挖掘可通过从大量文本集合中提取有用知识,帮助生物医学研究人员减轻信息过载。我们基于分析符号共现所创建的网络结构,开发了一种新颖的文本挖掘方法,以此扩展知识提取的能力。该方法应用于自动基因和蛋白质名称同义词提取任务。

结果

在一个由约50,000篇来自一年的MEDLINE摘要组成的测试集上进行性能评估。从经过整理的基因组学数据库中检索到的同义词用作黄金标准。该系统获得了22.21%的最高F值(精确率为23.18%,召回率为21.36%),在种子对的使用上效率较高。

结论

该方法与其他研究方法表现相当,不依赖复杂的命名实体识别,且所需的初始种子知识较少。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1c7/1090552/152697af9eeb/1471-2105-6-103-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1c7/1090552/d2cad4b4d4b0/1471-2105-6-103-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1c7/1090552/3261c1dd89af/1471-2105-6-103-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1c7/1090552/5e476742726f/1471-2105-6-103-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1c7/1090552/8bebee06dab1/1471-2105-6-103-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1c7/1090552/d7cd3541ec2c/1471-2105-6-103-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1c7/1090552/cd988d51f10e/1471-2105-6-103-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1c7/1090552/03a554fa5eaa/1471-2105-6-103-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1c7/1090552/152697af9eeb/1471-2105-6-103-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1c7/1090552/d2cad4b4d4b0/1471-2105-6-103-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1c7/1090552/3261c1dd89af/1471-2105-6-103-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1c7/1090552/5e476742726f/1471-2105-6-103-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1c7/1090552/8bebee06dab1/1471-2105-6-103-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1c7/1090552/d7cd3541ec2c/1471-2105-6-103-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1c7/1090552/cd988d51f10e/1471-2105-6-103-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1c7/1090552/03a554fa5eaa/1471-2105-6-103-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1c7/1090552/152697af9eeb/1471-2105-6-103-8.jpg

相似文献

1
Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts.利用共现网络结构从MEDLINE摘要中提取同义基因和蛋白质名称。
BMC Bioinformatics. 2005 Apr 22;6:103. doi: 10.1186/1471-2105-6-103.
2
MILANO--custom annotation of microarray results using automatic literature searches.米兰——使用自动文献检索对微阵列结果进行定制注释。
BMC Bioinformatics. 2005 Jan 20;6:12. doi: 10.1186/1471-2105-6-12.
3
PSE: a tool for browsing a large amount of MEDLINE/PubMed abstracts with gene names and common words as the keywords.PSE:一种以基因名称和常用词作为关键词来浏览大量MEDLINE/PubMed摘要的工具。
BMC Bioinformatics. 2005 Dec 10;6:295. doi: 10.1186/1471-2105-6-295.
4
Comparison of character-level and part of speech features for name recognition in biomedical texts.生物医学文本中用于名称识别的字符级特征与词性特征比较。
J Biomed Inform. 2004 Dec;37(6):423-35. doi: 10.1016/j.jbi.2004.08.008.
5
Automated recognition of malignancy mentions in biomedical literature.生物医学文献中恶性肿瘤提及的自动识别。
BMC Bioinformatics. 2006 Nov 7;7:492. doi: 10.1186/1471-2105-7-492.
6
Gene name identification and normalization using a model organism database.使用模式生物数据库进行基因名称识别与标准化
J Biomed Inform. 2004 Dec;37(6):396-410. doi: 10.1016/j.jbi.2004.08.010.
7
Building a protein name dictionary from full text: a machine learning term extraction approach.从全文构建蛋白质名称词典:一种机器学习术语提取方法。
BMC Bioinformatics. 2005 Apr 7;6:88. doi: 10.1186/1471-2105-6-88.
8
Automatic extraction of gene/protein biological functions from biomedical text.从生物医学文本中自动提取基因/蛋白质的生物学功能。
Bioinformatics. 2005 Apr 1;21(7):1227-36. doi: 10.1093/bioinformatics/bti084. Epub 2004 Oct 27.
9
Recognizing names in biomedical texts: a machine learning approach.识别生物医学文本中的名称:一种机器学习方法。
Bioinformatics. 2004 May 1;20(7):1178-90. doi: 10.1093/bioinformatics/bth060. Epub 2004 Feb 10.
10
Terminology-driven mining of biomedical literature.基于术语驱动的生物医学文献挖掘
Bioinformatics. 2003 May 22;19(8):938-43. doi: 10.1093/bioinformatics/btg105.

引用本文的文献

1
Constructing Genetic Networks using Biomedical Literature and Rare Event Classification.利用生物医学文献和罕见事件分类构建遗传网络。
Sci Rep. 2017 Nov 17;7(1):15784. doi: 10.1038/s41598-017-16081-2.
2
Expansion of medical vocabularies using distributional semantics on Japanese patient blogs.利用日语患者博客上的分布语义学扩展医学词汇
J Biomed Semantics. 2016 Sep 26;7(1):58. doi: 10.1186/s13326-016-0093-x.
3
Identifying Liver Cancer and Its Relations with Diseases, Drugs, and Genes: A Literature-Based Approach.基于文献的方法识别肝癌及其与疾病、药物和基因的关系。

本文引用的文献

1
Mining MEDLINE for implicit links between dietary substances and diseases.从医学在线数据库(MEDLINE)中挖掘饮食物质与疾病之间的潜在联系。
Bioinformatics. 2004 Aug 4;20 Suppl 1:i290-6. doi: 10.1093/bioinformatics/bth914.
2
Interference of BCR-ABL1 kinase activity with antigen receptor signaling in B cell precursor leukemia cells.
Cell Cycle. 2004 Jul;3(7):858-60. Epub 2004 Jul 25.
3
Gene indexing: characterization and analysis of NLM's GeneRIFs.基因索引:美国国立医学图书馆基因相关信息摘要(GeneRIFs)的特征与分析
PLoS One. 2016 May 19;11(5):e0156091. doi: 10.1371/journal.pone.0156091. eCollection 2016.
4
Semi-Supervised Learning to Identify UMLS Semantic Relations.用于识别统一医学语言系统语义关系的半监督学习
AMIA Jt Summits Transl Sci Proc. 2014 Apr 7;2014:67-75. eCollection 2014.
5
PubMedMiner: Mining and Visualizing MeSH-based Associations in PubMed.PubMedMiner:挖掘并可视化PubMed中基于医学主题词(MeSH)的关联
AMIA Annu Symp Proc. 2014 Nov 14;2014:1990-9. eCollection 2014.
6
Characterizing the sublanguage of online breast cancer forums for medications, symptoms, and emotions.描述在线乳腺癌论坛中关于药物、症状和情绪的子语言。
AMIA Annu Symp Proc. 2014 Nov 14;2014:516-25. eCollection 2014.
7
Extraction of temporal networks from term co-occurrences in online textual sources.从在线文本来源中的术语共现提取时间网络。
PLoS One. 2014 Dec 3;9(12):e99515. doi: 10.1371/journal.pone.0099515. eCollection 2014.
8
Synonym extraction and abbreviation expansion with ensembles of semantic spaces.使用语义空间集合进行同义词提取和缩写扩展。
J Biomed Semantics. 2014 Feb 5;5(1):6. doi: 10.1186/2041-1480-5-6.
9
Translating clinical findings into knowledge in drug safety evaluation--drug induced liver injury prediction system (DILIps).将临床发现转化为药物安全评价中的知识——药物性肝损伤预测系统(DILIps)。
PLoS Comput Biol. 2011 Dec;7(12):e1002310. doi: 10.1371/journal.pcbi.1002310. Epub 2011 Dec 15.
10
Googling social interactions: web search engine based social network construction.谷歌社交互动:基于网络搜索引擎的社交网络构建。
PLoS One. 2010 Jul 21;5(7):e11233. doi: 10.1371/journal.pone.0011233.
AMIA Annu Symp Proc. 2003;2003:460-4.
4
NOD2/CARD15 variants are associated with lower weight at diagnosis in children with Crohn's disease.NOD2/CARD15基因变异与克罗恩病患儿确诊时较低的体重有关。
Am J Gastroenterol. 2003 Nov;98(11):2479-84. doi: 10.1111/j.1572-0241.2003.08673.x.
5
DAX1 and its network partners: exploring complexity in development.DAX1及其网络伙伴:探索发育过程中的复杂性。
Mol Genet Metab. 2003 Sep-Oct;80(1-2):81-120. doi: 10.1016/j.ymgme.2003.08.023.
6
Extracting synonymous gene and protein terms from biological literature.从生物学文献中提取同义基因和蛋白质术语。
Bioinformatics. 2003;19 Suppl 1:i340-9. doi: 10.1093/bioinformatics/btg1047.
7
Rutabaga by any other name: extracting biological names.换个名字的芜菁:提取生物名称。
J Biomed Inform. 2002 Aug;35(4):247-59. doi: 10.1016/s1532-0464(03)00014-5.
8
PTEN decreases in vivo vascularization of experimental gliomas in spite of proangiogenic stimuli.尽管存在促血管生成刺激,PTEN仍会降低实验性胶质瘤的体内血管生成。
Cancer Res. 2003 May 1;63(9):2300-5.
9
p21 expression predicts outcome in p53-null ovarian carcinoma.p21表达可预测p53缺失型卵巢癌的预后。
Clin Cancer Res. 2003 Mar;9(3):1028-32.
10
Mining terminological knowledge in large biomedical corpora.从大型生物医学语料库中挖掘术语知识。
Pac Symp Biocomput. 2003:415-26.