Suppr超能文献

用于转化生物信息学的公共数据集的本体驱动索引编制

Ontology-driven indexing of public datasets for translational bioinformatics.

作者信息

Shah Nigam H, Jonquet Clement, Chiang Annie P, Butte Atul J, Chen Rong, Musen Mark A

机构信息

Centre for Biomedical Informatics, School of Medicine, Stanford University, Stanford, CA 94305, USA.

出版信息

BMC Bioinformatics. 2009 Feb 5;10 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2105-10-S2-S1.

Abstract

The volume of publicly available genomic scale data is increasing. Genomic datasets in public repositories are annotated with free-text fields describing the pathological state of the studied sample. These annotations are not mapped to concepts in any ontology, making it difficult to integrate these datasets across repositories. We have previously developed methods to map text-annotations of tissue microarrays to concepts in the NCI thesaurus and SNOMED-CT. In this work we generalize our methods to map text annotations of gene expression datasets to concepts in the UMLS. We demonstrate the utility of our methods by processing annotations of datasets in the Gene Expression Omnibus. We demonstrate that we enable ontology-based querying and integration of tissue and gene expression microarray data. We enable identification of datasets on specific diseases across both repositories. Our approach provides the basis for ontology-driven data integration for translational research on gene and protein expression data. Based on this work we have built a prototype system for ontology based annotation and indexing of biomedical data. The system processes the text metadata of diverse resource elements such as gene expression data sets, descriptions of radiology images, clinical-trial reports, and PubMed article abstracts to annotate and index them with concepts from appropriate ontologies. The key functionality of this system is to enable users to locate biomedical data resources related to particular ontology concepts.

摘要

公开可用的基因组规模数据量正在增加。公共存储库中的基因组数据集带有描述所研究样本病理状态的自由文本字段注释。这些注释未映射到任何本体中的概念,使得跨存储库整合这些数据集变得困难。我们之前已经开发了将组织微阵列的文本注释映射到NCI词库和SNOMED-CT中概念的方法。在这项工作中,我们将我们的方法进行推广,以将基因表达数据集的文本注释映射到UMLS中的概念。我们通过处理基因表达综合数据库中数据集的注释来证明我们方法的实用性。我们证明我们能够实现基于本体的组织和基因表达微阵列数据的查询与整合。我们能够识别两个存储库中关于特定疾病的数据集。我们的方法为基于本体的数据整合提供了基础,用于基因和蛋白质表达数据的转化研究。基于这项工作,我们构建了一个用于基于本体的生物医学数据注释和索引的原型系统。该系统处理各种资源元素的文本元数据,如基因表达数据集、放射学图像描述、临床试验报告和PubMed文章摘要,以便用适当本体中的概念对它们进行注释和索引。该系统的关键功能是使用户能够定位与特定本体概念相关的生物医学数据资源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ce3/2646250/e2ac594d6103/1471-2105-10-S2-S1-1.jpg

相似文献

1
Ontology-driven indexing of public datasets for translational bioinformatics.
BMC Bioinformatics. 2009 Feb 5;10 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2105-10-S2-S1.
2
Comparison of concept recognizers for building the Open Biomedical Annotator.
BMC Bioinformatics. 2009 Sep 17;10 Suppl 9(Suppl 9):S14. doi: 10.1186/1471-2105-10-S9-S14.
3
Annotation and query of tissue microarray data using the NCI Thesaurus.
BMC Bioinformatics. 2007 Aug 8;8:296. doi: 10.1186/1471-2105-8-296.
5
Comparison of automated and human assignment of MeSH terms on publicly-available molecular datasets.
J Biomed Inform. 2011 Dec;44 Suppl 1(Suppl 1):S39-S43. doi: 10.1016/j.jbi.2011.03.007. Epub 2011 Mar 21.
6
Concept annotation in the CRAFT corpus.
BMC Bioinformatics. 2012 Jul 9;13:161. doi: 10.1186/1471-2105-13-161.
7
Discovering gene annotations in biomedical text databases.
BMC Bioinformatics. 2008 Mar 6;9:143. doi: 10.1186/1471-2105-9-143.
8
Reuse of terminological resources for efficient ontological engineering in Life Sciences.
BMC Bioinformatics. 2009 Oct 1;10 Suppl 10(Suppl 10):S4. doi: 10.1186/1471-2105-10-S10-S4.
9
Application and evaluation of automated semantic annotation of gene expression experiments.
Bioinformatics. 2009 Jun 15;25(12):1543-9. doi: 10.1093/bioinformatics/btp259. Epub 2009 Apr 17.
10
Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy.
BMC Bioinformatics. 2009 Jan 21;10:28. doi: 10.1186/1471-2105-10-28.

引用本文的文献

1
Processing genome-wide association studies within a repository of heterogeneous genomic datasets.
BMC Genom Data. 2023 Mar 3;24(1):13. doi: 10.1186/s12863-023-01111-y.
2
Systematic tissue annotations of genomics samples by modeling unstructured metadata.
Nat Commun. 2022 Nov 8;13(1):6736. doi: 10.1038/s41467-022-34435-x.
3
Bias-invariant RNA-sequencing metadata annotation.
Gigascience. 2021 Sep 22;10(9). doi: 10.1093/gigascience/giab064.
4
ACE: the Advanced Cohort Engine for searching longitudinal patient records.
J Am Med Inform Assoc. 2021 Jul 14;28(7):1468-1479. doi: 10.1093/jamia/ocab027.
8
SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes.
BMC Bioinformatics. 2018 Nov 6;19(1):405. doi: 10.1186/s12859-018-2429-2.
10
DataMed - an open source discovery index for finding biomedical datasets.
J Am Med Inform Assoc. 2018 Mar 1;25(3):300-308. doi: 10.1093/jamia/ocx121.

本文引用的文献

1
The Stanford Tissue Microarray Database.
Nucleic Acids Res. 2008 Jan;36(Database issue):D871-7. doi: 10.1093/nar/gkm861. Epub 2007 Nov 7.
2
Knowledge-based methods to help clinicians find answers in MEDLINE.
J Am Med Inform Assoc. 2007 Nov-Dec;14(6):772-80. doi: 10.1197/jamia.M2407. Epub 2007 Aug 21.
3
Annotation and query of tissue microarray data using the NCI Thesaurus.
BMC Bioinformatics. 2007 Aug 8;8:296. doi: 10.1186/1471-2105-8-296.
6
A comparative evaluation of full-text, concept-based, and context-sensitive search.
J Am Med Inform Assoc. 2007 Mar-Apr;14(2):164-74. doi: 10.1197/jamia.M1953. Epub 2007 Jan 9.
7
Creation and implications of a phenome-genome network.
Nat Biotechnol. 2006 Jan;24(1):55-62. doi: 10.1038/nbt1150.
8
Text mining and ontologies in biomedicine: making sense of raw text.
Brief Bioinform. 2005 Sep;6(3):239-51. doi: 10.1093/bib/6.3.239.
9
The Unified Medical Language System (UMLS): integrating biomedical terminology.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70. doi: 10.1093/nar/gkh061.
10
Integration of genomic technologies for accelerated cancer drug development.
Biotechniques. 2003 Sep;35(3):580-2, 584, 586 passim. doi: 10.2144/03353dd01.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验