Institute of Computer Science, Goethe-Universität Frankfurt, Frankfurt am Main, Germany.
Department of Human Genetics, Hannover Medical School, Hannover, Germany.
BMC Bioinformatics. 2022 Dec 12;23(1):537. doi: 10.1186/s12859-022-05092-0.
Medical databases normally contain large amounts of data in a variety of forms. Although they grant significant insights into diagnosis and treatment, implementing data exploration into current medical databases is challenging since these are often based on a relational schema and cannot be used to easily extract information for cohort analysis and visualization. As a consequence, valuable information regarding cohort distribution or patient similarity may be missed. With the rapid advancement of biomedical technologies, new forms of data from methods such as Next Generation Sequencing (NGS) or chromosome microarray (array CGH) are constantly being generated; hence it can be expected that the amount and complexity of medical data will rise and bring relational database systems to a limit.
We present Graph4Med, a web application that relies on a graph database obtained by transforming a relational database. Graph4Med provides a straightforward visualization and analysis of a selected patient cohort. Our use case is a database of pediatric Acute Lymphoblastic Leukemia (ALL). Along routine patients' health records it also contains results of latest technologies such as NGS data. We developed a suitable graph data schema to convert the relational data into a graph data structure and store it in Neo4j. We used NeoDash to build a dashboard for querying and displaying patients' cohort analysis. This way our tool (1) quickly displays the overview of patients' cohort information such as distributions of gender, age, mutations (fusions), diagnosis; (2) provides mutation (fusion) based similarity search and display in a maneuverable graph; (3) generates an interactive graph of any selected patient and facilitates the identification of interesting patterns among patients.
We demonstrate the feasibility and advantages of a graph database for storing and querying medical databases. Our dashboard allows a fast and interactive analysis and visualization of complex medical data. It is especially useful for patients similarity search based on mutations (fusions), of which vast amounts of data have been generated by NGS in recent years. It can discover relationships and patterns in patients cohorts that are normally hard to grasp. Expanding Graph4Med to more medical databases will bring novel insights into diagnostic and research.
医学数据库通常包含各种形式的大量数据。虽然这些数据为诊断和治疗提供了重要的见解,但将数据探索应用于当前的医学数据库是具有挑战性的,因为这些数据库通常基于关系模式,难以用于轻松提取队列分析和可视化所需的信息。因此,可能会错过有关队列分布或患者相似性的有价值信息。随着生物医学技术的快速发展,来自下一代测序 (NGS) 或染色体微阵列 (array CGH) 等方法的新形式的数据不断产生;因此,可以预期医疗数据的数量和复杂性将会增加,并使关系型数据库系统达到极限。
我们提出了 Graph4Med,这是一个依赖于通过转换关系型数据库获得的图数据库的 Web 应用程序。Graph4Med 提供了对选定患者队列的直观可视化和分析。我们的用例是儿科急性淋巴细胞白血病 (ALL) 的数据库。除了常规患者的健康记录外,它还包含最新技术(如 NGS 数据)的结果。我们开发了合适的图数据模式将关系数据转换为图数据结构并将其存储在 Neo4j 中。我们使用 NeoDash 为查询和显示患者队列分析构建了一个仪表板。通过这种方式,我们的工具 (1) 快速显示患者队列信息的概述,例如性别、年龄、突变(融合)、诊断的分布;(2) 提供基于突变(融合)的相似性搜索和可操作图中的显示;(3) 为任何选定的患者生成交互图,并有助于识别患者之间的有趣模式。
我们展示了使用图数据库存储和查询医学数据库的可行性和优势。我们的仪表板允许快速、交互地分析和可视化复杂的医学数据。它特别适用于基于突变(融合)的患者相似性搜索,近年来,NGS 生成了大量此类数据。它可以发现患者队列中通常难以理解的关系和模式。将 Graph4Med 扩展到更多医学数据库将为诊断和研究带来新的见解。