Suppr超能文献

将癌症基因组图谱公开为 SPARQL 端点。

Exposing the cancer genome atlas as a SPARQL endpoint.

机构信息

Department of Bioinformatics and Computational Biology, The University of Texas M. D. Anderson Cancer Center, 1515 Holcombe Blvd., Unit 1410, Houston, TX 77230-1402, USA.

出版信息

J Biomed Inform. 2010 Dec;43(6):998-1008. doi: 10.1016/j.jbi.2010.09.004.

Abstract

The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to characterize several types of cancer. Datasets from biomedical domains such as TCGA present a particularly challenging task for those interested in dynamically aggregating its results because the data sources are typically both heterogeneous and distributed. The Linked Data best practices offer a solution to integrate and discover data with those characteristics, namely through exposure of data as Web services supporting SPARQL, the Resource Description Framework query language. Most SPARQL endpoints, however, cannot easily be queried by data experts. Furthermore, exposing experimental data as SPARQL endpoints remains a challenging task because, in most cases, data must first be converted to Resource Description Framework triples. In line with those requirements, we have developed an infrastructure to expose clinical, demographic and molecular data elements generated by TCGA as a SPARQL endpoint by assigning elements to entities of the Simple Sloppy Semantic Database (S3DB) management model. All components of the infrastructure are available as independent Representational State Transfer (REST) Web services to encourage reusability, and a simple interface was developed to automatically assemble SPARQL queries by navigating a representation of the TCGA domain. A key feature of the proposed solution that greatly facilitates assembly of SPARQL queries is the distinction between the TCGA domain descriptors and data elements. Furthermore, the use of the S3DB management model as a mediator enables queries to both public and protected data without the need for prior submission to a single data source.

摘要

癌症基因组图谱 (TCGA) 是一项多学科、多机构的努力,旨在描述几种类型的癌症。来自 TCGA 等生物医学领域的数据集对于那些有兴趣动态聚合其结果的人来说是一个特别具有挑战性的任务,因为数据源通常是异构的和分布式的。链接数据最佳实践为整合和发现具有这些特征的数据提供了一种解决方案,即通过将数据作为支持 SPARQL 的 Web 服务暴露,SPARQL 是资源描述框架查询语言。然而,大多数 SPARQL 端点都不容易被数据专家查询。此外,将实验数据暴露为 SPARQL 端点仍然是一项具有挑战性的任务,因为在大多数情况下,数据必须首先转换为资源描述框架三元组。根据这些要求,我们开发了一种基础架构,通过将元素分配给简单松散语义数据库 (S3DB) 管理模型的实体,将由 TCGA 生成的临床、人口统计学和分子数据元素作为 SPARQL 端点公开。基础架构的所有组件都作为独立的表示状态转移 (REST) Web 服务提供,以鼓励重用,并开发了一个简单的界面,通过导航 TCGA 领域的表示来自动组装 SPARQL 查询。该解决方案的一个关键特性是极大地方便了 SPARQL 查询的组装,它区分了 TCGA 领域描述符和数据元素。此外,使用 S3DB 管理模型作为中介,可以查询公共和受保护的数据,而无需事先提交给单个数据源。

相似文献

1
Exposing the cancer genome atlas as a SPARQL endpoint.
J Biomed Inform. 2010 Dec;43(6):998-1008. doi: 10.1016/j.jbi.2010.09.004.
2
TopFed: TCGA tailored federated query processing and linking to LOD.
J Biomed Semantics. 2014 Dec 3;5:47. doi: 10.1186/2041-1480-5-47. eCollection 2014.
3
A journey to Semantic Web query federation in the life sciences.
BMC Bioinformatics. 2009 Oct 1;10 Suppl 10(Suppl 10):S10. doi: 10.1186/1471-2105-10-S10-S10.
4
AGUIA: autonomous graphical user interface assembly for clinical trials semantic data services.
BMC Med Inform Decis Mak. 2010 Oct 26;10:65. doi: 10.1186/1472-6947-10-65.
5
SPANG: a SPARQL client supporting generation and reuse of queries for distributed RDF databases.
BMC Bioinformatics. 2017 Feb 8;18(1):93. doi: 10.1186/s12859-017-1531-1.
6
S3QL: a distributed domain specific language for controlled semantic integration of life sciences data.
BMC Bioinformatics. 2011 Jul 14;12:285. doi: 10.1186/1471-2105-12-285.
7
SPARQL assist language-neutral query composer.
BMC Bioinformatics. 2012 Jan 25;13 Suppl 1(Suppl 1):S2. doi: 10.1186/1471-2105-13-S1-S2.
8
BioFed: federated query processing over life sciences linked open data.
J Biomed Semantics. 2017 Mar 15;8(1):13. doi: 10.1186/s13326-017-0118-0.
9
IDSM ChemWebRDF: SPARQLing small-molecule datasets.
J Cheminform. 2021 May 12;13(1):38. doi: 10.1186/s13321-021-00515-1.
10
Processing SPARQL queries with regular expressions in RDF databases.
BMC Bioinformatics. 2011 Mar 29;12 Suppl 2(Suppl 2):S6. doi: 10.1186/1471-2105-12-S2-S6.

引用本文的文献

2
CrossLink: a novel method for cross-condition classification of cancer subtypes.
BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):549. doi: 10.1186/s12864-016-2903-z.
3
kpath: integration of metabolic pathway linked data.
Database (Oxford). 2015 Jun 8;2015:bav053. doi: 10.1093/database/bav053. Print 2015.
4
Next generation distributed computing for cancer research.
Cancer Inform. 2015 Apr 27;13(Suppl 7):97-109. doi: 10.4137/CIN.S16344. eCollection 2014.
5
TopFed: TCGA tailored federated query processing and linking to LOD.
J Biomed Semantics. 2014 Dec 3;5:47. doi: 10.1186/2041-1480-5-47. eCollection 2014.
6
QMachine: commodity supercomputing in web browsers.
BMC Bioinformatics. 2014 Jun 9;15:176. doi: 10.1186/1471-2105-15-176.
8
MicroRNA 4423 is a primate-specific regulator of airway epithelial cell differentiation and lung carcinogenesis.
Proc Natl Acad Sci U S A. 2013 Nov 19;110(47):18946-51. doi: 10.1073/pnas.1220319110. Epub 2013 Oct 24.
9
A self-updating road map of The Cancer Genome Atlas.
Bioinformatics. 2013 May 15;29(10):1333-40. doi: 10.1093/bioinformatics/btt141. Epub 2013 Apr 17.

本文引用的文献

1
International network of cancer genome projects.
Nature. 2010 Apr 15;464(7291):993-8. doi: 10.1038/nature08987.
2
Semantic web data warehousing for caGrid.
BMC Bioinformatics. 2009 Oct 1;10 Suppl 10(Suppl 10):S2. doi: 10.1186/1471-2105-10-S10-S2.
3
A journey to Semantic Web query federation in the life sciences.
BMC Bioinformatics. 2009 Oct 1;10 Suppl 10(Suppl 10):S10. doi: 10.1186/1471-2105-10-S10-S10.
4
BioPortal: ontologies and integrated data resources at the click of a mouse.
Nucleic Acids Res. 2009 Jul;37(Web Server issue):W170-3. doi: 10.1093/nar/gkp440. Epub 2009 May 29.
5
Biological knowledge management: the emerging role of the Semantic Web technologies.
Brief Bioinform. 2009 Jul;10(4):392-407. doi: 10.1093/bib/bbp024. Epub 2009 May 19.
6
Semantic Web for Health Care and Life Sciences: a review of the state of the art.
Brief Bioinform. 2009 Mar;10(2):111-3. doi: 10.1093/bib/bbp015.
7
Life sciences on the Semantic Web: the Neurocommons and beyond.
Brief Bioinform. 2009 Mar;10(2):193-204. doi: 10.1093/bib/bbp004. Epub 2009 Mar 12.
8
The caBIG terminology review process.
J Biomed Inform. 2009 Jun;42(3):571-80. doi: 10.1016/j.jbi.2008.12.003. Epub 2008 Dec 25.
9
Moby and Moby 2: creatures of the deep (web).
Brief Bioinform. 2009 Mar;10(2):114-28. doi: 10.1093/bib/bbn051. Epub 2009 Jan 16.
10
Exploratory analysis of the copy number alterations in glioblastoma multiforme.
PLoS One. 2008;3(12):e4076. doi: 10.1371/journal.pone.0004076. Epub 2008 Dec 30.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验