• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于生物和临床数据集的同质且无连接图节点的 Neo4j 中的决策树学习。

Decision tree learning in Neo4j on homogeneous and unconnected graph nodes from biological and clinical datasets.

机构信息

Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany.

Faculty of Process and Systems Engineering, Otto-von-Guericke-University Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany.

出版信息

BMC Med Inform Decis Mak. 2023 Mar 6;22(Suppl 6):347. doi: 10.1186/s12911-023-02112-8.

DOI:10.1186/s12911-023-02112-8
PMID:36879243
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9988195/
Abstract

BACKGROUND

Graph databases enable efficient storage of heterogeneous, highly-interlinked data, such as clinical data. Subsequently, researchers can extract relevant features from these datasets and apply machine learning for diagnosis, biomarker discovery, or understanding pathogenesis.

METHODS

To facilitate machine learning and save time for extracting data from the graph database, we developed and optimized Decision Tree Plug-in (DTP) containing 24 procedures to generate and evaluate decision trees directly in the graph database Neo4j on homogeneous and unconnected nodes.

RESULTS

Creation of the decision tree for three clinical datasets directly in the graph database from the nodes required between 0.059 and 0.099 s, while calculating the decision tree with the same algorithm in Java from CSV files took 0.085-0.112 s. Furthermore, our approach was faster than the standard decision tree implementations in R (0.62 s) and equal to Python (0.08 s), also using CSV files as input for small datasets. In addition, we have explored the strengths of DTP by evaluating a large dataset (approx. 250,000 instances) to predict patients with diabetes and compared the performance against algorithms generated by state-of-the-art packages in R and Python. By doing so, we have been able to show competitive results on the performance of Neo4j, in terms of quality of predictions as well as time efficiency. Furthermore, we could show that high body-mass index and high blood pressure are the main risk factors for diabetes.

CONCLUSION

Overall, our work shows that integrating machine learning into graph databases saves time for additional processes as well as external memory, and could be applied to a variety of use cases, including clinical applications. This provides user with the advantages of high scalability, visualization and complex querying.

摘要

背景

图数据库能够高效存储异构的、高度关联的数据,如临床数据。随后,研究人员可以从这些数据集提取相关特征,并应用机器学习进行诊断、生物标志物发现或了解发病机制。

方法

为了便于机器学习并节省从图数据库中提取数据的时间,我们开发并优化了决策树插件(Decision Tree Plug-in,DTP),其中包含 24 个过程,可直接在图数据库 Neo4j 中的同构和无连接节点上生成和评估决策树。

结果

直接从图数据库中的节点创建三个临床数据集的决策树所需时间为 0.059 到 0.099 秒,而使用相同算法从 CSV 文件计算决策树则需要 0.085 到 0.112 秒。此外,对于小数据集,我们的方法比 R 中的标准决策树实现(0.62 秒)和 Python (0.08 秒)更快。此外,我们通过评估一个包含大约 250,000 个实例的大型数据集来探索 DTP 的优势,以预测糖尿病患者,并将性能与 R 和 Python 中的最新软件包生成的算法进行比较。通过这样做,我们能够展示 Neo4j 在预测质量和时间效率方面的竞争结果。此外,我们还可以表明,高体重指数和高血压是糖尿病的主要危险因素。

结论

总的来说,我们的工作表明,将机器学习集成到图数据库中可以节省额外的处理时间和外部内存,并且可以应用于各种用例,包括临床应用。这为用户提供了高可扩展性、可视化和复杂查询的优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bfd/9990184/8363bdb5af02/12911_2023_2112_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bfd/9990184/247184baa6c6/12911_2023_2112_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bfd/9990184/1ba33ea2b1d3/12911_2023_2112_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bfd/9990184/ba7bc67a8d1c/12911_2023_2112_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bfd/9990184/9edc52f01edf/12911_2023_2112_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bfd/9990184/1507dee9b7b6/12911_2023_2112_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bfd/9990184/de7bba5e734a/12911_2023_2112_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bfd/9990184/7d906903bede/12911_2023_2112_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bfd/9990184/8363bdb5af02/12911_2023_2112_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bfd/9990184/247184baa6c6/12911_2023_2112_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bfd/9990184/1ba33ea2b1d3/12911_2023_2112_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bfd/9990184/ba7bc67a8d1c/12911_2023_2112_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bfd/9990184/9edc52f01edf/12911_2023_2112_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bfd/9990184/1507dee9b7b6/12911_2023_2112_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bfd/9990184/de7bba5e734a/12911_2023_2112_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bfd/9990184/7d906903bede/12911_2023_2112_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bfd/9990184/8363bdb5af02/12911_2023_2112_Fig8_HTML.jpg

相似文献

1
Decision tree learning in Neo4j on homogeneous and unconnected graph nodes from biological and clinical datasets.基于生物和临床数据集的同质且无连接图节点的 Neo4j 中的决策树学习。
BMC Med Inform Decis Mak. 2023 Mar 6;22(Suppl 6):347. doi: 10.1186/s12911-023-02112-8.
2
Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations.利用生物医学知识图谱中的语义模式预测治疗和因果关系。
J Biomed Inform. 2018 Jun;82:189-199. doi: 10.1016/j.jbi.2018.05.003. Epub 2018 May 12.
3
neo4jsbml: import systems biology markup language data into the graph database Neo4j.neo4jsbml:将系统生物学标记语言数据导入到图数据库 Neo4j 中。
PeerJ. 2024 Jan 16;12:e16726. doi: 10.7717/peerj.16726. eCollection 2024.
4
Use of Graph Database for the Integration of Heterogeneous Biological Data.使用图形数据库整合异构生物数据。
Genomics Inform. 2017 Mar;15(1):19-27. doi: 10.5808/GI.2017.15.1.19. Epub 2017 Mar 29.
5
Implementation of a HL7-CQL Engine Using the Graph Database Neo4J.使用图数据库Neo4J实现HL7-CQL引擎
Stud Health Technol Inform. 2019 Sep 3;267:46-51. doi: 10.3233/SHTI190804.
6
The importance of graph databases and graph learning for clinical applications.图数据库和图学习在临床应用中的重要性。
Database (Oxford). 2023 Jul 10;2023. doi: 10.1093/database/baad045.
7
Reactome graph database: Efficient access to complex pathway data.Reactome 图形数据库:高效访问复杂的通路数据。
PLoS Comput Biol. 2018 Jan 29;14(1):e1005968. doi: 10.1371/journal.pcbi.1005968. eCollection 2018 Jan.
8
Classification and Recognition of Building Appearance Based on Optimized Gradient-Boosted Decision Tree Algorithm.基于优化梯度提升决策树算法的建筑物外观分类与识别。
Sensors (Basel). 2023 Jun 5;23(11):5353. doi: 10.3390/s23115353.
9
Building Protein-Protein Interaction Graph Database Using Neo4j.使用 Neo4j 构建蛋白质-蛋白质相互作用图数据库。
Methods Mol Biol. 2023;2690:469-479. doi: 10.1007/978-1-0716-3327-4_36.
10
Graph4Med: a web application and a graph database for visualizing and analyzing medical databases.Graph4Med:一个用于可视化和分析医学数据库的网络应用程序和图数据库。
BMC Bioinformatics. 2022 Dec 12;23(1):537. doi: 10.1186/s12859-022-05092-0.

引用本文的文献

1
Depressive symptom as a risk factor for cirrhosis in patients with primary biliary cholangitis: Analysis based on Lasso-logistic regression and decision tree models.抑郁症状作为原发性胆汁性胆管炎患者肝硬化的风险因素:基于 Lasso-逻辑回归和决策树模型的分析。
Brain Behav. 2024 Aug;14(8):e3639. doi: 10.1002/brb3.3639.

本文引用的文献

1
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.马修斯相关系数(MCC)在二分类评估中优于 F1 得分和准确率的优势。
BMC Genomics. 2020 Jan 2;21(1):6. doi: 10.1186/s12864-019-6413-7.
2
Metaproteomics of fecal samples of Crohn's disease and Ulcerative Colitis.克罗恩病和溃疡性结肠炎粪便样本的代谢组学研究。
J Proteomics. 2019 Jun 15;201:93-103. doi: 10.1016/j.jprot.2019.04.009. Epub 2019 Apr 19.
3
Diabetes and Stroke: Epidemiology, Pathophysiology, Pharmaceuticals and Outcomes.
糖尿病与中风:流行病学、病理生理学、药物治疗及预后
Am J Med Sci. 2016 Apr;351(4):380-6. doi: 10.1016/j.amjms.2016.01.011.