Suppr超能文献

基于癌症系统生物学数据库(CancerSysDB)的癌症基因组学数据的综合分析和机器学习。

Integrative analysis and machine learning on cancer genomics data using the Cancer Systems Biology Database (CancerSysDB).

机构信息

Regional Computing Center of the University of Cologne (RRZK), Cologne, Germany.

Bioinformatics Facility, CECAD Research Center, University of Cologne, Cologne, Germany.

出版信息

BMC Bioinformatics. 2018 Apr 24;19(1):156. doi: 10.1186/s12859-018-2157-7.

Abstract

BACKGROUND

Recent cancer genome studies on many human cancer types have relied on multiple molecular high-throughput technologies. Given the vast amount of data that has been generated, there are surprisingly few databases which facilitate access to these data and make them available for flexible analysis queries in the broad research community. If used in their entirety and provided at a high structural level, these data can be directed into constantly increasing databases which bear an enormous potential to serve as a basis for machine learning technologies with the goal to support research and healthcare with predictions of clinically relevant traits.

RESULTS

We have developed the Cancer Systems Biology Database (CancerSysDB), a resource for highly flexible queries and analysis of cancer-related data across multiple data types and multiple studies. The CancerSysDB can be adopted by any center for the organization of their locally acquired data and its integration with publicly available data from multiple studies. A publicly available main instance of the CancerSysDB can be used to obtain highly flexible queries across multiple data types as shown by highly relevant use cases. In addition, we demonstrate how the CancerSysDB can be used for predictive cancer classification based on whole-exome data from 9091 patients in The Cancer Genome Atlas (TCGA) research network.

CONCLUSIONS

Our database bears the potential to be used for large-scale integrative queries and predictive analytics of clinically relevant traits.

摘要

背景

最近对许多人类癌症类型的癌症基因组研究依赖于多种分子高通量技术。鉴于已经产生了大量的数据,却只有极少数数据库能够方便地访问这些数据,并使它们能够在广泛的研究社区中进行灵活的分析查询。如果将这些数据全部使用,并以高结构水平提供,它们可以被定向到不断增加的数据库中,这些数据库具有巨大的潜力,可以作为机器学习技术的基础,旨在通过预测与临床相关的特征来支持研究和医疗保健。

结果

我们开发了癌症系统生物学数据库(CancerSysDB),这是一个用于跨多种数据类型和多个研究进行癌症相关数据高度灵活查询和分析的资源。CancerSysDB 可以被任何中心采用,用于组织其本地获取的数据,并将其与来自多个研究的公开可用数据集成。CancerSysDB 的公共主要实例可用于跨多种数据类型进行高度灵活的查询,正如高度相关的用例所示。此外,我们还展示了如何基于癌症基因组图谱(TCGA)研究网络中的 9091 名患者的全外显子数据,使用 CancerSysDB 进行预测性癌症分类。

结论

我们的数据库具有用于大规模综合查询和预测与临床相关特征的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc4e/5921751/679870572ac3/12859_2018_2157_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验