Suppr超能文献

一种基于拓扑数据分析的精准医学亚组发现新方法。

A novel method for subgroup discovery in precision medicine based on topological data analysis.

作者信息

Loughrey Ciara F, Maguire Sarah, Dłotko Paweł, Bai Lu, Orr Nick, Jurek-Loughrey Anna

机构信息

School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast, UK.

Patrick G Johnston Centre for Cancer Research, Queen's University Belfast, Belfast, UK.

出版信息

BMC Med Inform Decis Mak. 2025 Mar 19;25(1):139. doi: 10.1186/s12911-025-02852-9.

Abstract

BACKGROUND

The Mapper algorithm is a data mining topological tool that can help us to obtain higher level understanding of disease by visualising the structure of patient data as a similarity graph. It has been successfully applied for exploratory analysis of cancer data in the past, delivering several significant subgroup discoveries. Using the Mapper algorithm in practice requires setting up multiple parameters. The graph then needs to be manually analysed according to a research question at hand. It has been highlighted in the literature that Mapper's parameters have significant impact on the output graph shape and there is no established way to select their optimal values. Hence while using the Mapper algorithm, different parameter values and consequently different output graphs need to be studied. This prevents routine application of the Mapper algorithm in real world settings.

METHODS

We propose a new algorithm for subgroup discovery within the Mapper graph. We refer to the task as hotspot detection as it is designed to identify homogenous and geometrically compact subsets of patients, which are distinct with respect to their clinical or molecular profiles (e.g. survival). Furthermore, we propose to include the existence of a hotspot as a criterion while searching the parameter space, addressing one of the key limitations of the Mapper algorithm (i.e. parameter selection).

RESULTS

Two experiments were performed to demonstrate the efficacy of the algorithm, including an artificial hotspot in the Two Circles dataset and a real world case study of subgroup discovery in oestrogen receptor-positive breast cancer. Our hotspot detection algorithm successfully identified graphs containing homogenous communities of nodes within the Two Circles dataset. When applied to gene expression data of ER+ breast cancer patients, appropriate parameters were identified to generate a Mapper graph revealing a hotspot of ER+ patients with poor prognosis and characteristic patterns of gene expression. This was subsequently confirmed in an independent breast cancer dataset.

CONCLUSIONS

Our proposed method can be effectively applied for subgroup discovery with pathology data. It allows us to find optimal parameters of the Mapper algorithm, bridging the gap between its potential and the translational research.

摘要

背景

Mapper算法是一种数据挖掘拓扑工具,通过将患者数据结构可视化为相似性图,帮助我们对疾病有更深入的理解。过去它已成功应用于癌症数据的探索性分析,取得了多项重要的亚组发现。在实际应用中使用Mapper算法需要设置多个参数。然后需要根据手头的研究问题对手绘图形进行分析。文献中强调,Mapper的参数对输出图形的形状有重大影响,且尚无确定的方法来选择其最优值。因此,在使用Mapper算法时,需要研究不同的参数值以及相应的不同输出图形。这阻碍了Mapper算法在现实环境中的常规应用。

方法

我们提出了一种在Mapper图中进行亚组发现的新算法。我们将该任务称为热点检测,因为它旨在识别患者的同质且几何紧凑的子集,这些子集在临床或分子特征(如生存率)方面有所不同。此外,我们建议在搜索参数空间时将热点的存在作为一个标准,解决Mapper算法的一个关键限制(即参数选择)。

结果

进行了两项实验来证明该算法的有效性,包括在“双圆”数据集中设置一个人工热点,以及对雌激素受体阳性乳腺癌亚组发现的实际案例研究。我们的热点检测算法成功地在“双圆”数据集中识别出包含节点同质群落的图形。当应用于雌激素受体阳性乳腺癌患者的基因表达数据时,确定了合适的参数以生成一个Mapper图,揭示了预后不良的雌激素受体阳性患者的热点以及基因表达的特征模式。这随后在一个独立的乳腺癌数据集中得到了证实。

结论

我们提出的方法可以有效地应用于病理学数据的亚组发现。它使我们能够找到Mapper算法的最优参数,弥合其潜力与转化研究之间的差距。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3faf/11921513/7a2b0b17e36a/12911_2025_2852_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验