• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种增强型确定性 K-Means 聚类算法,用于从基因表达数据中预测癌症亚型。

An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data.

机构信息

Department of Electronics and Communication Engineering, National Institute of Technology Calicut, Kerala 673601, India.

Department of Computer Science and Engineering, National Institute of Technology Calicut, Kerala 673601, India.

出版信息

Comput Biol Med. 2017 Dec 1;91:213-221. doi: 10.1016/j.compbiomed.2017.10.014. Epub 2017 Oct 23.

DOI:10.1016/j.compbiomed.2017.10.014
PMID:29100115
Abstract

BACKGROUND

Clustering algorithms with steps involving randomness usually give different results on different executions for the same dataset. This non-deterministic nature of algorithms such as the K-Means clustering algorithm limits their applicability in areas such as cancer subtype prediction using gene expression data. It is hard to sensibly compare the results of such algorithms with those of other algorithms. The non-deterministic nature of K-Means is due to its random selection of data points as initial centroids.

METHOD

We propose an improved, density based version of K-Means, which involves a novel and systematic method for selecting initial centroids. The key idea of the algorithm is to select data points which belong to dense regions and which are adequately separated in feature space as the initial centroids.

RESULTS

We compared the proposed algorithm to a set of eleven widely used single clustering algorithms and a prominent ensemble clustering algorithm which is being used for cancer data classification, based on the performances on a set of datasets comprising ten cancer gene expression datasets. The proposed algorithm has shown better overall performance than the others.

CONCLUSION

There is a pressing need in the Biomedical domain for simple, easy-to-use and more accurate Machine Learning tools for cancer subtype prediction. The proposed algorithm is simple, easy-to-use and gives stable results. Moreover, it provides comparatively better predictions of cancer subtypes from gene expression data.

摘要

背景

涉及随机性步骤的聚类算法通常会在对同一数据集的不同执行中给出不同的结果。 这种不确定性限制了算法的适用性,例如使用基因表达数据进行癌症亚型预测。 很难明智地比较此类算法的结果与其他算法的结果。 K-Means 算法的不确定性是由于其随机选择数据点作为初始质心。

方法

我们提出了一种改进的、基于密度的 K-Means 版本,它涉及一种选择初始质心的新颖而系统的方法。 该算法的关键思想是选择属于密集区域并且在特征空间中充分分离的数据点作为初始质心。

结果

我们根据一组包含十个癌症基因表达数据集的数据集上的性能,将所提出的算法与一组十一种广泛使用的单一聚类算法和一种用于癌症数据分类的突出集成聚类算法进行了比较。 所提出的算法的整体性能优于其他算法。

结论

在生物医学领域,对于癌症亚型预测的简单、易用和更准确的机器学习工具存在迫切需求。 所提出的算法简单易用,结果稳定。 此外,它提供了比较好的基因表达数据的癌症亚型预测。

相似文献

1
An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data.一种增强型确定性 K-Means 聚类算法,用于从基因表达数据中预测癌症亚型。
Comput Biol Med. 2017 Dec 1;91:213-221. doi: 10.1016/j.compbiomed.2017.10.014. Epub 2017 Oct 23.
2
Does Determination of Initial Cluster Centroids Improve the Performance of -Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm, Minimum Spanning Tree, and Hierarchical Clustering in an Applied Study.初始聚类质心的确定是否能提高 -Means 聚类算法的性能?在应用研究中,通过遗传算法、最小生成树和层次聚类三种混合方法的比较。
Comput Math Methods Med. 2020 Aug 1;2020:7636857. doi: 10.1155/2020/7636857. eCollection 2020.
3
Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data.用于从生物分子数据中进行肿瘤聚类的混合模糊聚类集成框架。
IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):657-70. doi: 10.1109/TCBB.2013.59.
4
Double Selection Based Semi-Supervised Clustering Ensemble for Tumor Clustering from Gene Expression Profiles.基于双重选择的半监督聚类集成用于从基因表达谱中进行肿瘤聚类
IEEE/ACM Trans Comput Biol Bioinform. 2014 Jul-Aug;11(4):727-40. doi: 10.1109/TCBB.2014.2315996.
5
Analysis of k-means clustering approach on the breast cancer Wisconsin dataset.基于威斯康星乳腺癌数据集的k均值聚类方法分析
Int J Comput Assist Radiol Surg. 2016 Nov;11(11):2033-2047. doi: 10.1007/s11548-016-1437-9. Epub 2016 Jun 16.
6
Evaluation of clustering algorithms for gene expression data.基因表达数据聚类算法的评估
BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S17. doi: 10.1186/1471-2105-7-S4-S17.
7
Knowledge based cluster ensemble for cancer discovery from biomolecular data.基于知识的聚类集成在生物分子数据中的癌症发现。
IEEE Trans Nanobioscience. 2011 Jun;10(2):76-85. doi: 10.1109/TNB.2011.2144997. Epub 2011 Jul 7.
8
Functional grouping of similar genes using eigenanalysis on minimum spanning tree based neighborhood graph.基于最小生成树邻域图的特征分析对相似基因进行功能分组。
Comput Biol Med. 2016 Apr 1;71:135-48. doi: 10.1016/j.compbiomed.2016.02.007. Epub 2016 Feb 21.
9
[Cluster ensemble algorithm based on dual neural gas applied to cancer gene expression profiles].基于双神经气体的聚类集成算法在癌症基因表达谱中的应用
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2015 Feb;32(1):93-8.
10
PK-means: A new algorithm for gene clustering.PK均值算法:一种用于基因聚类的新算法。
Comput Biol Chem. 2008 Aug;32(4):243-7. doi: 10.1016/j.compbiolchem.2008.03.020. Epub 2008 May 27.

引用本文的文献

1
A densely connected framework for cancer subtype classification.一种用于癌症亚型分类的密集连接框架。
BMC Bioinformatics. 2025 Jul 18;26(1):183. doi: 10.1186/s12859-025-06230-0.
2
Association between treatment response and dose of blonanserin transdermal patch in patients with acute schizophrenia: A post hoc cluster analysis based on baseline psychiatric symptoms.急性精神分裂症患者中布南色林透皮贴剂治疗反应与剂量之间的关联:基于基线精神症状的事后聚类分析
Neuropsychopharmacol Rep. 2024 Dec;44(4):784-791. doi: 10.1002/npr2.12490. Epub 2024 Oct 20.
3
A Contrastive-Learning-Based Deep Neural Network for Cancer Subtyping by Integrating Multi-Omics Data.
基于对比学习的深度学习神经网络,通过整合多组学数据进行癌症亚型分类。
Interdiscip Sci. 2024 Dec;16(4):966-975. doi: 10.1007/s12539-024-00641-y. Epub 2024 Sep 4.
4
The novel hierarchical clustering approach using self-organizing map with optimum dimension selection.使用具有最优维度选择的自组织映射的新型层次聚类方法。
Health Care Sci. 2024 Apr 11;3(2):88-100. doi: 10.1002/hcs2.90. eCollection 2024 Apr.
5
Machine learning approaches for biomolecular, biophysical, and biomaterials research.用于生物分子、生物物理和生物材料研究的机器学习方法。
Biophys Rev (Melville). 2022 Jun 3;3(2):021306. doi: 10.1063/5.0082179. eCollection 2022 Jun.
6
The immunogenic radiation and new players in immunotherapy and targeted therapy for head and neck cancer.头颈部癌免疫原性放疗以及免疫治疗和靶向治疗中的新参与者。
Front Oral Health. 2023 Jul 11;4:1180869. doi: 10.3389/froh.2023.1180869. eCollection 2023.
7
ForestSubtype: a cancer subtype identifying approach based on high-dimensional genomic data and a parallel random forest.森林亚型:一种基于高维基因组数据和并行随机森林的癌症亚型识别方法。
BMC Bioinformatics. 2023 Jul 19;24(1):289. doi: 10.1186/s12859-023-05412-y.
8
A novel autophagy-related subtypes to distinguish immune phenotypes and predict immunotherapy response in head and neck squamous cell carcinoma.一种新型自噬相关亚型,可区分头颈部鳞状细胞癌的免疫表型并预测免疫治疗反应。
Biomol Biomed. 2023 Nov 3;23(6):997-1013. doi: 10.17305/bb.2023.9094.
9
Integrated genomic analysis defines molecular subgroups in dilated cardiomyopathy and identifies novel biomarkers based on machine learning methods.综合基因组分析确定扩张型心肌病的分子亚组并基于机器学习方法识别新型生物标志物。
Front Genet. 2023 Feb 7;14:1050696. doi: 10.3389/fgene.2023.1050696. eCollection 2023.
10
Modification of m5C regulators in sarcoma can guide different immune infiltrations as well as immunotherapy.肉瘤中m5C调节因子的修饰可引导不同的免疫浸润以及免疫治疗。
Front Surg. 2023 Jan 6;9:948371. doi: 10.3389/fsurg.2022.948371. eCollection 2022.