Suppr超能文献

DBSCAN和DBCV在开放医疗记录异构数据中的应用,用于识别神经母细胞瘤患者具有临床意义的聚类。

DBSCAN and DBCV application to open medical records heterogeneous data for identifying clinically significant clusters of patients with neuroblastoma.

作者信息

Chicco Davide, Oneto Luca, Cangelosi Davide

机构信息

Università di Milano-Bicocca, Milan, Italy.

University of Toronto, Toronto, Ontario, Canada.

出版信息

BioData Min. 2025 Jun 12;18(1):40. doi: 10.1186/s13040-025-00455-8.

Abstract

Neuroblastoma is a common pediatric cancer that affects thousands of infants worldwide, especially children under five years of age. Although recovery for patients with neuroblastoma is possible in 80% of cases, only 40% of those with high-risk stage four neuroblastoma survive. Electronic health records of patients with this disease contain valuable data on patients that can be analyzed using computational intelligence and statistical software by biomedical informatics researchers. Unsupervised machine learning methods, in particular, can identify clinically significant subgroups of patients, which can lead to new therapies or medical treatments for future patients belonging to the same subgroups. However, access to these datasets is often restricted, making it difficult to obtain them for independent research projects. In this study, we retrieved three open datasets containing data from patients diagnosed with neuroblastoma: the Genoa dataset and the Shanghai dataset from the Neuroblastoma Electronic Health Records Open Data Repository, and a dataset from the TARGET-NBL renowned program. We analyzed these datasets using several clustering techniques and measured the results with the DBCV (Density-Based Clustering Validation) index. Among these algorithms, DBSCAN (Density-Based Spatial Clustering of Applications with Noise) was the only one that produced meaningful results. We scrutinized the two clusters of patients' profiles identified by DBSCAN in the three datasets and recognized several relevant clinical variables that clearly partitioned the patients into the two clusters that have clinical meaning in the neuroblastoma literature. Our results can have a significant impact on health informatics, because any computational analyst wishing to cluster small data of patients of a rare disease can choose to use DBSCAN and DBCV rather than utilizing more common methods such as k-Means and Silhouette coefficient.

摘要

神经母细胞瘤是一种常见的儿科癌症,影响着全球数千名婴儿,尤其是五岁以下的儿童。尽管80%的神经母细胞瘤患者有可能康复,但高危四期神经母细胞瘤患者中只有40%能够存活。患有这种疾病的患者的电子健康记录包含有关患者的宝贵数据,生物医学信息学研究人员可以使用计算智能和统计软件对这些数据进行分析。特别是无监督机器学习方法,可以识别具有临床意义的患者亚组,这可能会为未来属于同一亚组的患者带来新的治疗方法。然而,获取这些数据集往往受到限制,使得独立研究项目难以获得它们。在本研究中,我们检索了三个包含被诊断为神经母细胞瘤患者数据的开放数据集:来自神经母细胞瘤电子健康记录开放数据存储库的热那亚数据集和上海数据集,以及来自著名的TARGET-NBL项目的一个数据集。我们使用几种聚类技术对这些数据集进行了分析,并用基于密度的聚类验证(DBCV)指数来衡量结果。在这些算法中,基于密度的带噪声应用空间聚类(DBSCAN)是唯一产生有意义结果的算法。我们仔细研究了DBSCAN在三个数据集中识别出的两组患者概况,并识别出几个相关的临床变量,这些变量将患者清晰地分为两组,这两组在神经母细胞瘤文献中具有临床意义。我们的结果可能会对健康信息学产生重大影响,因为任何希望对罕见病患者的小数据进行聚类的计算分析师都可以选择使用DBSCAN和DBCV,而不是使用更常见的方法,如k均值和轮廓系数。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58f8/12164137/4fa9bd3f51e7/13040_2025_455_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验