• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于识别与小鼠基因表达数据库(GXD)相关出版物的有效生物医学文献分类。

Effective biomedical document classification for identifying publications relevant to the mouse Gene Expression Database (GXD).

作者信息

Jiang Xiangying, Ringwald Martin, Blake Judith, Shatkay Hagit

机构信息

Department of Computer and Information Sciences, University of Delaware, 101 Smith Hall, Newark, DE, USA.

Department of Computer and Information Sciences, The Jackson Laboratory, 600 Main Street, Bar Harbor, ME, USA.

出版信息

Database (Oxford). 2017 Jan 1;2017(1). doi: 10.1093/database/bax017.

DOI:10.1093/database/bax017
PMID:28365740
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5467553/
Abstract

UNLABELLED

The Gene Expression Database (GXD) is a comprehensive online database within the Mouse Genome Informatics resource, aiming to provide available information about endogenous gene expression during mouse development. The information stems primarily from many thousands of biomedical publications that database curators must go through and read. Given the very large number of biomedical papers published each year, automatic document classification plays an important role in biomedical research. Specifically, an effective and efficient document classifier is needed for supporting the GXD annotation workflow. We present here an effective yet relatively simple classification scheme, which uses readily available tools while employing feature selection, aiming to assist curators in identifying publications relevant to GXD. We examine the performance of our method over a large manually curated dataset, consisting of more than 25 000 PubMed abstracts, of which about half are curated as relevant to GXD while the other half as irrelevant to GXD. In addition to text from title-and-abstract, we also consider image captions, an important information source that we integrate into our method. We apply a captions-based classifier to a subset of about 3300 documents, for which the full text of the curated articles is available. The results demonstrate that our proposed approach is robust and effectively addresses the GXD document classification. Moreover, using information obtained from image captions clearly improves performance, compared to title and abstract alone, affirming the utility of image captions as a substantial evidence source for automatically determining the relevance of biomedical publications to a specific subject area.

DATABASE URL

www.informatics.jax.org.

摘要

未标注

基因表达数据库(GXD)是小鼠基因组信息学资源中的一个综合性在线数据库,旨在提供有关小鼠发育过程中内源性基因表达的可用信息。这些信息主要源于数据库管理员必须查阅的数千篇生物医学出版物。鉴于每年发表的生物医学论文数量众多,自动文档分类在生物医学研究中发挥着重要作用。具体而言,需要一个有效且高效的文档分类器来支持GXD注释工作流程。我们在此提出一种有效但相对简单的分类方案,该方案使用现成的工具同时进行特征选择,旨在帮助管理员识别与GXD相关的出版物。我们在一个大型人工整理的数据集上检验了我们方法的性能,该数据集由超过25000篇PubMed摘要组成,其中约一半被整理为与GXD相关,另一半与GXD不相关。除了标题和摘要中的文本,我们还考虑图像标题,这是一个重要的信息来源,我们将其整合到我们的方法中。我们将基于标题的分类器应用于大约3300篇文档的子集,这些文档有整理好的文章全文。结果表明,我们提出的方法是稳健的,有效地解决了GXD文档分类问题。此外,与仅使用标题和摘要相比,使用从图像标题中获得的信息明显提高了性能,这证实了图像标题作为自动确定生物医学出版物与特定主题领域相关性的重要证据来源的实用性。

数据库网址

www.informatics.jax.org。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1cd/5467553/e514d388e7a7/bax017f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1cd/5467553/3a0ed6abd0f6/bax017f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1cd/5467553/e514d388e7a7/bax017f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1cd/5467553/3a0ed6abd0f6/bax017f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1cd/5467553/e514d388e7a7/bax017f2.jpg

相似文献

1
Effective biomedical document classification for identifying publications relevant to the mouse Gene Expression Database (GXD).用于识别与小鼠基因表达数据库(GXD)相关出版物的有效生物医学文献分类。
Database (Oxford). 2017 Jan 1;2017(1). doi: 10.1093/database/bax017.
2
Integrating image caption information into biomedical document classification in support of biocuration.将图像标题信息整合到生物医学文献分类中,以支持生物注释。
Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa024.
3
An effective biomedical document classification scheme in support of biocuration: addressing class imbalance.一种有效的支持生物注释的生物医学文献分类方案:解决类不平衡问题。
Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz045.
4
GXD: a community resource of mouse Gene Expression Data.基因表达数据库(GXD):小鼠基因表达数据的社区资源。
Mamm Genome. 2015 Aug;26(7-8):314-24. doi: 10.1007/s00335-015-9563-1. Epub 2015 May 5.
5
GXD's RNA-Seq and Microarray Experiment Search: using curated metadata to reliably find mouse expression studies of interest.基因表达数据库(GXD)的RNA测序和微阵列实验搜索:利用经过整理的元数据可靠地找到感兴趣的小鼠表达研究。
Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa002.
6
The mouse Gene Expression Database (GXD): 2011 update.小鼠基因表达数据库(GXD):2011年更新版。
Nucleic Acids Res. 2011 Jan;39(Database issue):D835-41. doi: 10.1093/nar/gkq1132. Epub 2010 Nov 9.
7
Utilizing image and caption information for biomedical document classification.利用图像和标题信息进行生物医学文献分类。
Bioinformatics. 2021 Jul 12;37(Suppl_1):i468-i476. doi: 10.1093/bioinformatics/btab331.
8
BioReader: a text mining tool for performing classification of biomedical literature.BioReader:一种文本挖掘工具,用于对生物医学文献进行分类。
BMC Bioinformatics. 2019 Feb 4;19(Suppl 13):57. doi: 10.1186/s12859-019-2607-x.
9
The mouse Gene Expression Database (GXD): updates and enhancements.小鼠基因表达数据库(GXD):更新与增强
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D568-71. doi: 10.1093/nar/gkh069.
10
The mouse Gene Expression Database (GXD): 2014 update.《小鼠基因表达数据库(GXD):2014 年更新》
Nucleic Acids Res. 2014 Jan;42(Database issue):D818-24. doi: 10.1093/nar/gkt954. Epub 2013 Oct 25.

引用本文的文献

1
MetaTron: advancing biomedical annotation empowering relation annotation and collaboration.MetaTron:推进生物医学标注,赋能关系标注与协作。
BMC Bioinformatics. 2024 Mar 14;25(1):112. doi: 10.1186/s12859-024-05730-9.
2
Automatic identification of scientific publications describing digital reconstructions of neural morphology.自动识别描述神经形态数字重建的科学出版物。
Brain Inform. 2023 Sep 8;10(1):23. doi: 10.1186/s40708-023-00202-x.
3
Automatic identification of scientific publications describing digital reconstructions of neural morphology.

本文引用的文献

1
Improving the utility of MeSH® terms using the TopicalMeSH representation.使用主题词表(TopicalMeSH)表示法提高医学主题词表(MeSH®)术语的实用性。
J Biomed Inform. 2016 Jun;61:77-86. doi: 10.1016/j.jbi.2016.03.013. Epub 2016 Mar 19.
2
Mouse genome database 2016.小鼠基因组数据库2016年版
Nucleic Acids Res. 2016 Jan 4;44(D1):D840-7. doi: 10.1093/nar/gkv1211. Epub 2015 Nov 17.
3
WormBase 2016: expanding to enable helminth genomic research.《线虫基因组数据库2016版:拓展助力蠕虫基因组研究》
自动识别描述神经形态数字重建的科学出版物。
bioRxiv. 2023 Feb 15:2023.02.14.527522. doi: 10.1101/2023.02.14.527522.
4
Classifying domain-specific text documents containing ambiguous keywords.对包含歧义关键词的特定领域文本文件进行分类。
Database (Oxford). 2021 Sep 29;2021:baab062. doi: 10.1093/database/baab062.
5
Utilizing image and caption information for biomedical document classification.利用图像和标题信息进行生物医学文献分类。
Bioinformatics. 2021 Jul 12;37(Suppl_1):i468-i476. doi: 10.1093/bioinformatics/btab331.
6
Effective biomedical document classification for identifying publications relevant to the mouse Gene Expression Database (GXD).用于识别与小鼠基因表达数据库(GXD)相关出版物的有效生物医学文档分类
Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa043.
7
Recent advances in biomedical literature mining.生物医学文献挖掘的最新进展。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa057.
8
Integrating image caption information into biomedical document classification in support of biocuration.将图像标题信息整合到生物医学文献分类中,以支持生物注释。
Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa024.
9
An effective biomedical document classification scheme in support of biocuration: addressing class imbalance.一种有效的支持生物注释的生物医学文献分类方案:解决类不平衡问题。
Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz045.
10
A statistical approach to identify, monitor, and manage incomplete curated data sets.一种用于识别、监测和管理未完成编目数据集的统计方法。
BMC Bioinformatics. 2018 Apr 2;19(1):110. doi: 10.1186/s12859-018-2121-6.
Nucleic Acids Res. 2016 Jan 4;44(D1):D774-80. doi: 10.1093/nar/gkv1217. Epub 2015 Nov 17.
4
MouseMine: a new data warehouse for MGI.MouseMine:MGI的一个新数据仓库。
Mamm Genome. 2015 Aug;26(7-8):325-30. doi: 10.1007/s00335-015-9573-z. Epub 2015 Jun 20.
5
The mouse gene expression database: New features and how to use them effectively.小鼠基因表达数据库:新特性及如何有效使用它们。
Genesis. 2015 Aug;53(8):510-22. doi: 10.1002/dvg.22864. Epub 2015 Jun 18.
6
GXD: a community resource of mouse Gene Expression Data.基因表达数据库(GXD):小鼠基因表达数据的社区资源。
Mamm Genome. 2015 Aug;26(7-8):314-24. doi: 10.1007/s00335-015-9563-1. Epub 2015 May 5.
7
Community challenges in biomedical text mining over 10 years: success, failure and the future.十年来生物医学文本挖掘中的社区挑战:成功、失败与未来。
Brief Bioinform. 2016 Jan;17(1):132-44. doi: 10.1093/bib/bbv024. Epub 2015 May 1.
8
Mouse Tumor Biology (MTB): a database of mouse models for human cancer.小鼠肿瘤生物学(MTB):人类癌症小鼠模型数据库。
Nucleic Acids Res. 2015 Jan;43(Database issue):D818-24. doi: 10.1093/nar/gku987. Epub 2014 Oct 20.
9
Using the OntoGene pipeline for the triage task of BioCreative 2012.使用 OntoGene 流水线进行 BioCreative 2012 的分诊任务。
Database (Oxford). 2013 Feb 9;2013:bas053. doi: 10.1093/database/bas053. Print 2013.
10
Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.生物信息学工作流程和文本挖掘:BioCreative 2012 研讨会第二轨道概述。
Database (Oxford). 2012 Nov 17;2012:bas043. doi: 10.1093/database/bas043. Print 2012.