Biomedical Engineering, Oregon Health and Science University, Portland OR.
Biomolecular Engineering Department, University of California, Santa Cruz, Santa Cruz, CA.
JCO Clin Cancer Inform. 2020 Feb;4:147-159. doi: 10.1200/CCI.19.00110.
The analysis of cancer biology data involves extremely heterogeneous data sets, including information from RNA sequencing, genome-wide copy number, DNA methylation data reporting on epigenetic regulation, somatic mutations from whole-exome or whole-genome analyses, pathology estimates from imaging sections or subtyping, drug response or other treatment outcomes, and various other clinical and phenotypic measurements. Bringing these different resources into a common framework, with a data model that allows for complex relationships as well as dense vectors of features, will unlock integrated data set analysis.
We introduce the BioMedical Evidence Graph (BMEG), a graph database and query engine for discovery and analysis of cancer biology. The BMEG is unique from other biologic data graphs in that sample-level molecular and clinical information is connected to reference knowledge bases. It combines gene expression and mutation data with drug-response experiments, pathway information databases, and literature-derived associations.
The construction of the BMEG has resulted in a graph containing > 41 million vertices and 57 million edges. The BMEG system provides a graph query-based application programming interface to enable analysis, with client code available for Python, Javascript, and R, and a server online at bmeg.io. Using this system, we have demonstrated several forms of cross-data set analysis to show the utility of the system.
The BMEG is an evolving resource dedicated to enabling integrative analysis. We have demonstrated queries on the system that illustrate mutation significance analysis, drug-response machine learning, patient-level knowledge-base queries, and pathway level analysis. We have compared the resulting graph to other available integrated graph systems and demonstrated the former is unique in the scale of the graph and the type of data it makes available.
癌症生物学数据分析涉及极其异质的数据组,包括来自 RNA 测序、全基因组拷贝数、报告表观遗传调控的 DNA 甲基化数据、来自全外显子或全基因组分析的体细胞突变、来自成像切片或亚型划分的病理学估计、药物反应或其他治疗结果以及各种其他临床和表型测量。将这些不同的资源纳入一个通用框架,使用允许复杂关系和密集特征向量的数据模型,将解锁集成数据集分析。
我们介绍了生物医学证据图(BMEG),这是一个用于癌症生物学发现和分析的图数据库和查询引擎。BMEG 与其他生物数据图的独特之处在于,样本水平的分子和临床信息与参考知识库相连。它将基因表达和突变数据与药物反应实验、途径信息数据库和文献衍生的关联相结合。
BMEG 的构建导致了一个包含超过 4100 万个顶点和 5700 万个边的图。BMEG 系统提供了基于图形查询的应用程序编程接口来支持分析,客户端代码可用于 Python、Javascript 和 R,并在 bmeg.io 上提供在线服务器。使用该系统,我们已经展示了几种跨数据集分析形式,以展示系统的实用性。
BMEG 是一个不断发展的资源,致力于实现综合分析。我们已经在系统上演示了一些查询,说明了突变意义分析、药物反应机器学习、患者级知识库查询和途径级分析。我们将生成的图与其他可用的集成图系统进行了比较,并证明前者在图的规模和可用数据类型方面是独一无二的。