• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

HypertensionGene:从生物医学文献中提取关键高血压基因,使用位置和自动生成的模板特征。

HypertenGene: extracting key hypertension genes from biomedical literature with position and automatically-generated template features.

机构信息

Department of Computer Science and Engineering, Yuan Ze University, Chung Li, Taiwan, Republic of China.

出版信息

BMC Bioinformatics. 2009 Dec 3;10 Suppl 15(Suppl 15):S9. doi: 10.1186/1471-2105-10-S15-S9.

DOI:10.1186/1471-2105-10-S15-S9
PMID:19958519
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2788360/
Abstract

BACKGROUND

The genetic factors leading to hypertension have been extensively studied, and large numbers of research papers have been published on the subject. One of hypertension researchers' primary research tasks is to locate key hypertension-related genes in abstracts. However, gathering such information with existing tools is not easy: (1) Searching for articles often returns far too many hits to browse through. (2) The search results do not highlight the hypertension-related genes discovered in the abstract. (3) Even though some text mining services mark up gene names in the abstract, the key genes investigated in a paper are still not distinguished from other genes. To facilitate the information gathering process for hypertension researchers, one solution would be to extract the key hypertension-related genes in each abstract. Three major tasks are involved in the construction of this system: (1) gene and hypertension named entity recognition, (2) section categorization, and (3) gene-hypertension relation extraction.

RESULTS

We first compare the retrieval performance achieved by individually adding template features and position features to the baseline system. Then, the combination of both is examined. We found that using position features can almost double the original AUC score (0.8140 vs.0.4936) of the baseline system. However, adding template features only results in marginal improvement (0.0197). Including both improves AUC to 0.8184, indicating that these two sets of features are complementary, and do not have overlapping effects. We then examine the performance in a different domain--diabetes, and the result shows a satisfactory AUC of 0.83.

CONCLUSION

Our approach successfully exploits template features to recognize true hypertension-related gene mentions and position features to distinguish key genes from other related genes. Templates are automatically generated and checked by biologists to minimize labor costs. Our approach integrates the advantages of machine learning models and pattern matching. To the best of our knowledge, this the first systematic study of extracting hypertension-related genes and the first attempt to create a hypertension-gene relation corpus based on the GAD database. Furthermore, our paper proposes and tests novel features for extracting key hypertension genes, such as relative position, section, and template features, which could also be applied to key-gene extraction for other diseases.

摘要

背景

导致高血压的遗传因素已被广泛研究,并且已经发表了大量关于该主题的研究论文。高血压研究人员的主要研究任务之一是在摘要中定位关键的高血压相关基因。然而,使用现有工具收集此类信息并不容易:(1) 搜索文章经常会返回过多的命中结果,难以浏览。(2) 搜索结果并未突出摘要中发现的与高血压相关的基因。(3) 即使一些文本挖掘服务会在摘要中标注基因名称,但论文中调查的关键基因仍无法与其他基因区分开来。为了方便高血压研究人员的信息收集过程,一种解决方案是提取每个摘要中的关键高血压相关基因。该系统的构建涉及三个主要任务:(1) 基因和高血压命名实体识别,(2) 部分分类,(3) 基因-高血压关系提取。

结果

我们首先比较了单独向基线系统添加模板特征和位置特征时的检索性能。然后,我们检查了两者的组合。我们发现使用位置特征几乎可以将原始 AUC 分数(0.8140 对 0.4936)提高一倍。然而,添加模板特征仅导致微小的改进(0.0197)。同时包含两者可以将 AUC 提高到 0.8184,表明这两组特征是互补的,没有重叠的效果。然后,我们在不同的领域——糖尿病中检查了性能,结果显示令人满意的 AUC 为 0.83。

结论

我们的方法成功地利用模板特征来识别真正的高血压相关基因提及,并利用位置特征将关键基因与其他相关基因区分开来。模板由生物学家自动生成和检查,以最大限度地降低劳动力成本。我们的方法集成了机器学习模型和模式匹配的优势。据我们所知,这是首次系统地研究提取高血压相关基因的方法,也是首次尝试基于 GAD 数据库创建高血压-基因关系语料库。此外,我们的论文提出并测试了用于提取关键高血压基因的新特征,例如相对位置、部分和模板特征,这些特征也可应用于其他疾病的关键基因提取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9382/2788360/525e7d0a6bfa/1471-2105-10-S15-S9-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9382/2788360/e47105721afc/1471-2105-10-S15-S9-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9382/2788360/20aa4511ef77/1471-2105-10-S15-S9-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9382/2788360/ed13d59a967b/1471-2105-10-S15-S9-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9382/2788360/91b9553fd819/1471-2105-10-S15-S9-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9382/2788360/d2823d518d59/1471-2105-10-S15-S9-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9382/2788360/525e7d0a6bfa/1471-2105-10-S15-S9-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9382/2788360/e47105721afc/1471-2105-10-S15-S9-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9382/2788360/20aa4511ef77/1471-2105-10-S15-S9-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9382/2788360/ed13d59a967b/1471-2105-10-S15-S9-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9382/2788360/91b9553fd819/1471-2105-10-S15-S9-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9382/2788360/d2823d518d59/1471-2105-10-S15-S9-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9382/2788360/525e7d0a6bfa/1471-2105-10-S15-S9-6.jpg

相似文献

1
HypertenGene: extracting key hypertension genes from biomedical literature with position and automatically-generated template features.HypertensionGene:从生物医学文献中提取关键高血压基因,使用位置和自动生成的模板特征。
BMC Bioinformatics. 2009 Dec 3;10 Suppl 15(Suppl 15):S9. doi: 10.1186/1471-2105-10-S15-S9.
2
BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature.BIOADI:一种用于识别生物文献中缩写词和定义的机器学习方法。
BMC Bioinformatics. 2009 Dec 3;10 Suppl 15(Suppl 15):S7. doi: 10.1186/1471-2105-10-S15-S7.
3
PubMed-EX: a web browser extension to enhance PubMed search with text mining features.PubMed-EX:一款网络浏览器扩展,利用文本挖掘功能增强 PubMed 检索。
Bioinformatics. 2009 Nov 15;25(22):3031-2. doi: 10.1093/bioinformatics/btp475. Epub 2009 Aug 4.
4
An annotated dataset for extracting gene-melanoma relations from scientific literature.从科学文献中提取基因-黑色素瘤关系的带注释数据集。
J Biomed Semantics. 2022 Jan 19;13(1):2. doi: 10.1186/s13326-021-00251-3.
5
Text mining tools for extracting information about microbial biodiversity in food.用于从食品中提取微生物生物多样性信息的文本挖掘工具。
Food Microbiol. 2019 Aug;81:63-75. doi: 10.1016/j.fm.2018.04.011. Epub 2018 Apr 21.
6
BelSmile: a biomedical semantic role labeling approach for extracting biological expression language from text.BelSmile:一种用于从文本中提取生物表达语言的生物医学语义角色标注方法。
Database (Oxford). 2016 May 12;2016. doi: 10.1093/database/baw064. Print 2016.
7
Extracting semantically enriched events from biomedical literature.从生物医学文献中提取语义丰富的事件。
BMC Bioinformatics. 2012 May 23;13:108. doi: 10.1186/1471-2105-13-108.
8
GPDminer: a tool for extracting named entities and analyzing relations in biological literature.GPDminer:一种用于从生物文献中提取命名实体和分析关系的工具。
BMC Bioinformatics. 2024 Mar 6;25(1):101. doi: 10.1186/s12859-024-05710-z.
9
Automated recognition of malignancy mentions in biomedical literature.生物医学文献中恶性肿瘤提及的自动识别。
BMC Bioinformatics. 2006 Nov 7;7:492. doi: 10.1186/1471-2105-7-492.
10
bioNerDS: exploring bioinformatics' database and software use through literature mining.生物信息学数据库和软件的文献挖掘研究。
BMC Bioinformatics. 2013 Jun 15;14:194. doi: 10.1186/1471-2105-14-194.

引用本文的文献

1
A novel method for gathering and prioritizing disease candidate genes based on construction of a set of disease-related MeSH® terms.一种基于构建一组疾病相关医学主题词(MeSH®)来收集疾病候选基因并对其进行优先级排序的新方法。
BMC Bioinformatics. 2014 Jun 10;15:179. doi: 10.1186/1471-2105-15-179.
2
T-HOD: a literature-based candidate gene database for hypertension, obesity and diabetes.T-HOD:一个基于文献的高血压、肥胖和糖尿病候选基因数据库。
Database (Oxford). 2013 Feb 12;2013:bas061. doi: 10.1093/database/bas061. Print 2013.
3
An overview of the BioCreative 2012 Workshop Track III: interactive text mining task.

本文引用的文献

1
Extraction of semantic biomedical relations from text using conditional random fields.使用条件随机场从文本中提取语义生物医学关系。
BMC Bioinformatics. 2008 Apr 23;9:207. doi: 10.1186/1471-2105-9-207.
2
NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition.NERBio:利用选定的词连接、术语规范化和全局模式来改进生物医学命名实体识别。
BMC Bioinformatics. 2006 Dec 18;7 Suppl 5(Suppl 5):S11. doi: 10.1186/1471-2105-7-S5-S11.
3
Automatic recognition of topic-classified relations between prostate cancer and genes using MEDLINE abstracts.
BioCreative 2012 研讨会第三轨道:交互式文本挖掘任务概述。
Database (Oxford). 2013 Jan 17;2013:bas056. doi: 10.1093/database/bas056. Print 2013.
4
Combined SVM-CRFs for biological named entity recognition with maximal bidirectional squeezing.基于最大双向挤压的联合 SVM-CRFs 生物命名实体识别。
PLoS One. 2012;7(6):e39230. doi: 10.1371/journal.pone.0039230. Epub 2012 Jun 26.
5
MeInfoText 2.0: gene methylation and cancer relation extraction from biomedical literature.MeInfoText 2.0:从生物医学文献中提取基因甲基化与癌症的关系。
BMC Bioinformatics. 2011 Dec 14;12:471. doi: 10.1186/1471-2105-12-471.
6
Dynamic programming re-ranking for PPI interactor and pair extraction in full-text articles.在全文文章中重新排名 PPI 相互作用体和对的动态编程。
BMC Bioinformatics. 2011 Feb 23;12:60. doi: 10.1186/1471-2105-12-60.
7
Towards a career in bioinformatics.迈向生物信息学职业生涯。
BMC Bioinformatics. 2009 Dec 3;10 Suppl 15(Suppl 15):S1. doi: 10.1186/1471-2105-10-S15-S1.
利用医学在线摘要自动识别前列腺癌与基因之间的主题分类关系。
BMC Bioinformatics. 2006 Nov 24;7 Suppl 3(Suppl 3):S4. doi: 10.1186/1471-2105-7-S3-S4.
4
Extraction of gene-disease relations from Medline using domain dictionaries and machine learning.利用领域词典和机器学习从医学在线数据库中提取基因与疾病的关系。
Pac Symp Biocomput. 2006:4-15.
5
Using argumentation to extract key sentences from biomedical abstracts.利用论证从生物医学摘要中提取关键句子。
Int J Med Inform. 2007 Feb-Mar;76(2-3):195-200. doi: 10.1016/j.ijmedinf.2006.05.002. Epub 2006 Jul 11.
6
Argument-predicate distance as a filter for enhancing precision in extracting predications on the genetic etiology of disease.作为一种用于提高疾病遗传病因预测提取精度的过滤器的论据-谓词距离。
BMC Bioinformatics. 2006 Jun 8;7:291. doi: 10.1186/1471-2105-7-291.
7
Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome.整合已知的人类蛋白质-蛋白质相互作用集,为大规模绘制人类相互作用组做准备。
Genome Biol. 2005;6(5):R40. doi: 10.1186/gb-2005-6-5-r40. Epub 2005 Apr 15.
8
Comparative experiments on learning information extractors for proteins and their interactions.蛋白质及其相互作用的学习信息提取器的比较实验。
Artif Intell Med. 2005 Feb;33(2):139-55. doi: 10.1016/j.artmed.2004.07.016.
9
The genetic association database.基因关联数据库。
Nat Genet. 2004 May;36(5):431-2. doi: 10.1038/ng0504-431.
10
GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data.基因途径系统:一个用于提取、分析、可视化和整合分子途径数据的系统。
J Biomed Inform. 2004 Feb;37(1):43-53. doi: 10.1016/j.jbi.2003.10.001.