• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

文献中的搜索数据集:全基因组关联研究的案例分析

Search Datasets in Literature: A Case Study of GWAS.

作者信息

Dong Xiao, Zhang Yaoyun, Xu Hua

机构信息

School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.

出版信息

AMIA Jt Summits Transl Sci Proc. 2017 Jul 26;2017:40-49. eCollection 2017.

PMID:28815103
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5543360/
Abstract

One of the missions of the NIH BD2K (Big Data to Knowledge) initiative is to make data discoverable and promote the re-use of existing datasets. Our ultimate goal is to develop a scalable approach that can automatically scan millions of scientific publications and identify underlying data sets. Using Genome-Wide Association Studies (GWAS) as a use case, we conducted an initial study to identify GWAS dataset attributes in MEDLINE abstracts, by developing a hybrid approach that combines domain dictionaries and pattern-based rules. The automatic GWAS dataset attribute recognition system achieved an F-measure of 84.85%. We further applied the GWAS attribute recognition system to indexing MEDLINE abstracts and built an online GWAS dataset search engine called "GWAS Dataset Finder". Our evaluation showed that the GWAS Dataset Finder outperformed PubMed significantly in retrieving literature with desired datasets. Our study demonstrates the potential application of text mining methods in building the data discovery index. It can create a better index of literature linked with their underlying data sets, thus improving data discoverability.

摘要

美国国立卫生研究院大数据到知识(NIH BD2K)计划的任务之一是使数据可被发现,并促进现有数据集的重复使用。我们的最终目标是开发一种可扩展的方法,该方法能够自动扫描数百万篇科学出版物并识别潜在的数据集。以全基因组关联研究(GWAS)为例,我们开展了一项初步研究,通过开发一种结合领域词典和基于模式的规则的混合方法,来识别MEDLINE摘要中的GWAS数据集属性。自动GWAS数据集属性识别系统的F值达到了84.85%。我们进一步将GWAS属性识别系统应用于MEDLINE摘要的索引编制,并构建了一个名为“GWAS数据集查找器”的在线GWAS数据集搜索引擎。我们的评估表明,在检索带有所需数据集的文献方面,GWAS数据集查找器的表现明显优于PubMed。我们的研究证明了文本挖掘方法在构建数据发现索引中的潜在应用。它可以创建一个与潜在数据集相关联的更好的文献索引,从而提高数据的可发现性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8455/5543360/3b0fa4ffd53f/2613176f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8455/5543360/35ab5379ed9b/2613176f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8455/5543360/b352d2cbab14/2613176f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8455/5543360/7843087f4288/2613176f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8455/5543360/3b0fa4ffd53f/2613176f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8455/5543360/35ab5379ed9b/2613176f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8455/5543360/b352d2cbab14/2613176f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8455/5543360/7843087f4288/2613176f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8455/5543360/3b0fa4ffd53f/2613176f4.jpg

相似文献

1
Search Datasets in Literature: A Case Study of GWAS.文献中的搜索数据集:全基因组关联研究的案例分析
AMIA Jt Summits Transl Sci Proc. 2017 Jul 26;2017:40-49. eCollection 2017.
2
DATS, the data tag suite to enable discoverability of datasets.DATS,用于实现数据集可发现性的数据标签套件。
Sci Data. 2017 Jun 6;4:170059. doi: 10.1038/sdata.2017.59.
3
A content-based literature recommendation system for datasets to improve data reusability - A case study on Gene Expression Omnibus (GEO) datasets.基于内容的文献推荐系统,用于数据集,以提高数据可重用性 - 以基因表达综合 (GEO) 数据集为例。
J Biomed Inform. 2020 Apr;104:103399. doi: 10.1016/j.jbi.2020.103399. Epub 2020 Mar 6.
4
OryzaGP: rice gene and protein dataset for named-entity recognition.OryzaGP:用于命名实体识别的水稻基因和蛋白质数据集。
Genomics Inform. 2019 Jun;17(2):e17. doi: 10.5808/GI.2019.17.2.e17. Epub 2019 Jun 26.
5
Data discovery with DATS: exemplar adoptions and lessons learned.利用 DATS 进行数据发现:典型采用案例和经验教训。
J Am Med Inform Assoc. 2018 Jan 1;25(1):13-16. doi: 10.1093/jamia/ocx119.
6
Protein names precisely peeled off free text.蛋白质名称从自由文本中精确提取。
Bioinformatics. 2004 Aug 4;20 Suppl 1:i241-7. doi: 10.1093/bioinformatics/bth904.
7
Developing a framework for digital objects in the Big Data to Knowledge (BD2K) commons: Report from the Commons Framework Pilots workshop.为大数据到知识(BD2K)共享库中的数字对象开发一个框架:共享库框架试点研讨会报告
J Biomed Inform. 2017 Jul;71:49-57. doi: 10.1016/j.jbi.2017.05.006. Epub 2017 May 10.
8
Detecting significant genotype-phenotype association rules in bipolar disorder: market research meets complex genetics.在双相情感障碍中检测显著的基因型-表型关联规则:市场研究与复杂遗传学的结合。
Int J Bipolar Disord. 2018 Nov 11;6(1):24. doi: 10.1186/s40345-018-0132-x.
9
SciRide Finder: a citation-based paradigm in biomedical literature search.SciRide 查找器:基于引文的生物医学文献搜索范例。
Sci Rep. 2018 Apr 18;8(1):6193. doi: 10.1038/s41598-018-24571-0.
10
Robust Reference Powered Association Test of Genome-Wide Association Studies.全基因组关联研究的稳健参考驱动关联测试
Front Genet. 2019 Apr 9;10:319. doi: 10.3389/fgene.2019.00319. eCollection 2019.

引用本文的文献

1
Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016.基于基线和扩展的复杂医学数据信息检索方法:波兹南在 bioCADDIE 2016 中的方法。
Database (Oxford). 2018 Jan 1;2018. doi: 10.1093/database/bax103.

本文引用的文献

1
Discovering and linking public omics data sets using the Omics Discovery Index.使用组学发现指数发现并链接公共组学数据集。
Nat Biotechnol. 2017 May 9;35(5):406-409. doi: 10.1038/nbt.3790.
2
GRASP: analysis of genotype-phenotype results from 1390 genome-wide association studies and corresponding open access database.GRASP:1390 项全基因组关联研究及相应开放获取数据库的基因型-表型结果分析。
Bioinformatics. 2014 Jun 15;30(12):i185-94. doi: 10.1093/bioinformatics/btu273.
3
The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.
NHGRI GWAS Catalog,一个经过精心策划的 SNP 与特征关联资源。
Nucleic Acids Res. 2014 Jan;42(Database issue):D1001-6. doi: 10.1093/nar/gkt1229. Epub 2013 Dec 6.
4
Disease Ontology: a backbone for disease semantic integration.疾病本体论:疾病语义集成的骨干。
Nucleic Acids Res. 2012 Jan;40(Database issue):D940-6. doi: 10.1093/nar/gkr972. Epub 2011 Nov 12.
5
GWAS Integrator: a bioinformatics tool to explore human genetic associations reported in published genome-wide association studies.GWAS 整合器:一种生物信息学工具,用于探索已发表的全基因组关联研究中报告的人类遗传关联。
Eur J Hum Genet. 2011 Oct;19(10):1095-9. doi: 10.1038/ejhg.2011.91. Epub 2011 May 25.
6
Genomewide association studies and assessment of the risk of disease.全基因组关联研究与疾病风险评估
N Engl J Med. 2010 Jul 8;363(2):166-76. doi: 10.1056/NEJMra0905980.
7
Genomics: Hepatitis C virus gets personal.基因组学:丙型肝炎病毒个体化研究
Nature. 2009 Sep 17;461(7262):357-8. doi: 10.1038/461357a.
8
Cohort studies and the genetics of complex disease.队列研究与复杂疾病的遗传学
Nat Genet. 2009 Jan;41(1):5-6. doi: 10.1038/ng0109-5.
9
The NCBI dbGaP database of genotypes and phenotypes.美国国立医学图书馆的基因型和表型数据库(NCBI dbGaP)。
Nat Genet. 2007 Oct;39(10):1181-6. doi: 10.1038/ng1007-1181.
10
Investigating subsumption in SNOMED CT: an exploration into large description logic-based biomedical terminologies.研究SNOMED CT中的包含关系:对基于大型描述逻辑的生物医学术语的探索。
Artif Intell Med. 2007 Mar;39(3):183-95. doi: 10.1016/j.artmed.2006.12.003. Epub 2007 Jan 22.