• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

后基因组数据分析中的计算聚类验证

Computational cluster validation in post-genomic data analysis.

作者信息

Handl Julia, Knowles Joshua, Kell Douglas B

机构信息

School of Chemistry, University of Manchester, Faraday Building, Sackville Street, PO Box 88, Manchester M60 1QD, UK.

出版信息

Bioinformatics. 2005 Aug 1;21(15):3201-12. doi: 10.1093/bioinformatics/bti517. Epub 2005 May 24.

DOI:10.1093/bioinformatics/bti517
PMID:15914541
Abstract

MOTIVATION

The discovery of novel biological knowledge from the ab initio analysis of post-genomic data relies upon the use of unsupervised processing methods, in particular clustering techniques. Much recent research in bioinformatics has therefore been focused on the transfer of clustering methods introduced in other scientific fields and on the development of novel algorithms specifically designed to tackle the challenges posed by post-genomic data. The partitions returned by a clustering algorithm are commonly validated using visual inspection and concordance with prior biological knowledge--whether the clusters actually correspond to the real structure in the data is somewhat less frequently considered. Suitable computational cluster validation techniques are available in the general data-mining literature, but have been given only a fraction of the same attention in bioinformatics.

RESULTS

This review paper aims to familiarize the reader with the battery of techniques available for the validation of clustering results, with a particular focus on their application to post-genomic data analysis. Synthetic and real biological datasets are used to demonstrate the benefits, and also some of the perils, of analytical clustervalidation.

AVAILABILITY

The software used in the experiments is available at http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/.

SUPPLEMENTARY INFORMATION

Enlarged colour plots are provided in the Supplementary Material, which is available at http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/.

摘要

动机

从基因组后数据的从头分析中发现新的生物学知识依赖于无监督处理方法的使用,特别是聚类技术。因此,生物信息学领域最近的许多研究都集中在其他科学领域引入的聚类方法的迁移,以及专门为应对基因组后数据带来的挑战而设计的新算法的开发上。聚类算法返回的划分通常通过目视检查和与先前生物学知识的一致性来验证——而聚类是否真的与数据中的真实结构相对应则较少被考虑。通用数据挖掘文献中提供了合适的计算聚类验证技术,但在生物信息学中受到的关注却少得多。

结果

这篇综述文章旨在让读者熟悉可用于验证聚类结果的一系列技术,特别关注它们在基因组后数据分析中的应用。使用合成和真实生物数据集来展示分析聚类验证的好处以及一些风险。

可用性

实验中使用的软件可在http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/获取。

补充信息

补充材料中提供了放大的彩色图,可在http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/获取。

相似文献

1
Computational cluster validation in post-genomic data analysis.后基因组数据分析中的计算聚类验证
Bioinformatics. 2005 Aug 1;21(15):3201-12. doi: 10.1093/bioinformatics/bti517. Epub 2005 May 24.
2
Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.聚类验证指标的加权排序聚合:一种蒙特卡洛交叉熵方法。
Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.
3
A modified hyperplane clustering algorithm allows for efficient and accurate clustering of extremely large datasets.一种改进的超平面聚类算法能够对超大型数据集进行高效且准确的聚类。
Bioinformatics. 2009 May 1;25(9):1152-7. doi: 10.1093/bioinformatics/btp123. Epub 2009 Mar 4.
4
How does gene expression clustering work?基因表达聚类是如何工作的?
Nat Biotechnol. 2005 Dec;23(12):1499-501. doi: 10.1038/nbt1205-1499.
5
Graph-based consensus clustering for class discovery from gene expression data.基于图的共识聚类用于从基因表达数据中发现类别
Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.
6
Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.使用功能类别参考集评估基因表达数据聚类算法的方法。
BMC Bioinformatics. 2006 Aug 31;7:397. doi: 10.1186/1471-2105-7-397.
7
Clustering and re-clustering for pattern discovery in gene expression data.用于基因表达数据中模式发现的聚类和再聚类。
J Bioinform Comput Biol. 2005 Apr;3(2):281-301. doi: 10.1142/s0219720005001053.
8
Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data.用于高通量生物数据中具有分散对象和先验信息的聚类的惩罚加权K均值算法
Bioinformatics. 2007 Sep 1;23(17):2247-55. doi: 10.1093/bioinformatics/btm320. Epub 2007 Jun 27.
9
VISDA: an open-source caBIG analytical tool for data clustering and beyond.VISDA:一个用于数据聚类及其他功能的开源caBIG分析工具。
Bioinformatics. 2007 Aug 1;23(15):2024-7. doi: 10.1093/bioinformatics/btm290. Epub 2007 May 31.
10
Noise-robust soft clustering of gene expression time-course data.基因表达时间序列数据的抗噪声软聚类
J Bioinform Comput Biol. 2005 Aug;3(4):965-88. doi: 10.1142/s0219720005001375.

引用本文的文献

1
Exploring the Transitivity Assumption in Network Meta-Analysis: A Novel Approach and Its Implications.探索网络荟萃分析中的传递性假设:一种新方法及其影响。
Stat Med. 2025 Mar 30;44(7):e70068. doi: 10.1002/sim.70068.
2
Suboptimal Comparison of Partitions.分区的次优比较
J Classif. 2020 Jul;37(2):435-461. doi: 10.1007/s00357-019-09329-1. Epub 2019 Jul 11.
3
Translational Algorithms for Technological Dietary Quality Assessment Integrating Nutrimetabolic Data with Machine Learning Methods.用于技术膳食质量评估的转化算法:将营养代谢数据与机器学习方法相结合
Nutrients. 2024 Nov 7;16(22):3817. doi: 10.3390/nu16223817.
4
Temporal patterns of energy intake and physical activity and cross-sectional associations with body weight status in children and adolescents: results from the Portuguese National Food, Nutrition and Physical Activity Survey 2015-2016.儿童和青少年能量摄入与身体活动的时间模式及其与体重状况的横断面关联:葡萄牙2015 - 2016年全国食品、营养与身体活动调查结果
Br J Nutr. 2024 Dec 28;132(12):1684-1697. doi: 10.1017/S0007114524002861. Epub 2024 Nov 11.
5
Development and validation of the Upstream Social Interaction Risk Scale (U-SIRS-13): a scale to assess threats to social connectedness among older adults.《上游社会互动风险量表(U-SIRS-13)的编制与验证:评估老年人社会联系受威胁程度的量表》
Front Public Health. 2024 Sep 16;12:1454847. doi: 10.3389/fpubh.2024.1454847. eCollection 2024.
6
Determination of the number of clusters through logistic regression analysis.通过逻辑回归分析确定聚类数量。
J Appl Stat. 2023 Nov 20;51(12):2344-2363. doi: 10.1080/02664763.2023.2283687. eCollection 2024.
7
Who Benefits Most from the Family Education and Support Program in Cape Verde? A Cluster Analysis.佛得角家庭教育与支持项目的最大受益者是谁?一项聚类分析。
Children (Basel). 2024 Jun 27;11(7):782. doi: 10.3390/children11070782.
8
SillyPutty: Improved clustering by optimizing the silhouette width.SillyPutty:通过优化轮廓宽度实现聚类改进。
PLoS One. 2024 Jun 7;19(6):e0300358. doi: 10.1371/journal.pone.0300358. eCollection 2024.
9
Understanding and including 'pink-collar' workers in employment-based travel demand models.理解并纳入“粉领”工人到基于就业的出行需求模型中。
PLoS One. 2024 Apr 18;19(4):e0301001. doi: 10.1371/journal.pone.0301001. eCollection 2024.
10
Bayesian profile regression for clustering analysis involving a longitudinal response and explanatory variables.用于涉及纵向响应和解释变量的聚类分析的贝叶斯轮廓回归
Methodology (Gott). 2024 Mar 11;73(2):314-339. doi: 10.1093/jrsssc/qlad097. Epub 2023 Nov 8.