• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于基因表达数据的具有最优叶排序的K元聚类

K-ary clustering with optimal leaf ordering for gene expression data.

作者信息

Bar-Joseph Ziv, Demaine Erik D, Gifford David K, Srebro Nathan, Hamel Angèle M, Jaakkola Tommi S

机构信息

Laboratory for Computer Science, MIT, 545 Technology Square, Cambridge, MA 02139, USA.

出版信息

Bioinformatics. 2003 Jun 12;19(9):1070-8. doi: 10.1093/bioinformatics/btg030.

DOI:10.1093/bioinformatics/btg030
PMID:12801867
Abstract

MOTIVATION

A major challenge in gene expression analysis is effective data organization and visualization. One of the most popular tools for this task is hierarchical clustering. Hierarchical clustering allows a user to view relationships in scales ranging from single genes to large sets of genes, while at the same time providing a global view of the expression data. However, hierarchical clustering is very sensitive to noise, it usually lacks of a method to actually identify distinct clusters, and produces a large number of possible leaf orderings of the hierarchical clustering tree. In this paper we propose a new hierarchical clustering algorithm which reduces susceptibility to noise, permits up to k siblings to be directly related, and provides a single optimal order for the resulting tree.

RESULTS

We present an algorithm that efficiently constructs a k-ary tree, where each node can have up to k children, and then optimally orders the leaves of that tree. By combining k clusters at each step our algorithm becomes more robust against noise and missing values. By optimally ordering the leaves of the resulting tree we maintain the pairwise relationships that appear in the original method, without sacrificing the robustness. Our k-ary construction algorithm runs in O(n(3)) regardless of k and our ordering algorithm runs in O(4(k)n(3)). We present several examples that show that our k-ary clustering algorithm achieves results that are superior to the binary tree results in both global presentation and cluster identification.

AVAILABILITY

We have implemented the above algorithms in C++ on the Linux operating system.

摘要

动机

基因表达分析中的一个主要挑战是有效的数据组织和可视化。用于此任务的最流行工具之一是层次聚类。层次聚类允许用户查看从单个基因到大量基因集合范围内的关系,同时提供表达数据的全局视图。然而,层次聚类对噪声非常敏感,通常缺乏实际识别不同簇的方法,并且会产生层次聚类树的大量可能叶排序。在本文中,我们提出了一种新的层次聚类算法,该算法降低了对噪声的敏感性,允许多达k个兄弟节点直接相关,并为所得树提供单个最优排序。

结果

我们提出了一种算法,该算法能高效地构建一棵k叉树,其中每个节点最多可有k个子节点,然后对该树的叶进行最优排序。通过在每个步骤合并k个簇,我们的算法对噪声和缺失值变得更具鲁棒性。通过对所得树的叶进行最优排序,我们保持了原始方法中出现的成对关系,而不牺牲鲁棒性。我们的k叉构建算法无论k为何值都在O(n(3))时间内运行,我们的排序算法在O(4(k)n(3))时间内运行。我们给出了几个示例,表明我们的k叉聚类算法在全局呈现和簇识别方面都取得了优于二叉树结果的效果。

可用性

我们已在Linux操作系统上用C++实现了上述算法。

相似文献

1
K-ary clustering with optimal leaf ordering for gene expression data.用于基因表达数据的具有最优叶排序的K元聚类
Bioinformatics. 2003 Jun 12;19(9):1070-8. doi: 10.1093/bioinformatics/btg030.
2
A new algorithm for comparing and visualizing relationships between hierarchical and flat gene expression data clusterings.一种用于比较和可视化层次化与平面化基因表达数据聚类之间关系的新算法。
Bioinformatics. 2005 Nov 1;21(21):3993-9. doi: 10.1093/bioinformatics/bti644. Epub 2005 Sep 1.
3
A dynamically growing self-organizing tree (DGSOT) for hierarchical clustering gene expression profiles.一种用于分层聚类基因表达谱的动态生长自组织树(DGSOT)。
Bioinformatics. 2004 Nov 1;20(16):2605-17. doi: 10.1093/bioinformatics/bth292. Epub 2004 May 6.
4
Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.使用一致性算法对大型DNA微阵列数据集进行稳健的多尺度聚类
Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27.
5
CLICK and EXPANDER: a system for clustering and visualizing gene expression data.CLICK和EXPANDER:一种用于基因表达数据聚类和可视化的系统。
Bioinformatics. 2003 Sep 22;19(14):1787-99. doi: 10.1093/bioinformatics/btg232.
6
Fast optimal leaf ordering for hierarchical clustering.用于层次聚类的快速最优叶排序
Bioinformatics. 2001;17 Suppl 1:S22-9. doi: 10.1093/bioinformatics/17.suppl_1.s22.
7
Clustering binary fingerprint vectors with missing values for DNA array data analysis.用于DNA阵列数据分析的带有缺失值的二元指纹向量聚类
Proc IEEE Comput Soc Bioinform Conf. 2003;2:38-47.
8
Interactively optimizing signal-to-noise ratios in expression profiling: project-specific algorithm selection and detection p-value weighting in Affymetrix microarrays.在表达谱分析中交互式优化信噪比:Affymetrix微阵列中特定项目的算法选择和检测p值加权
Bioinformatics. 2004 Nov 1;20(16):2534-44. doi: 10.1093/bioinformatics/bth280. Epub 2004 Apr 29.
9
Comparisons and validation of statistical clustering techniques for microarray gene expression data.微阵列基因表达数据统计聚类技术的比较与验证
Bioinformatics. 2003 Mar 1;19(4):459-66. doi: 10.1093/bioinformatics/btg025.
10
Optimized leaf ordering with class labels for hierarchical clustering.用于层次聚类的带类别标签的优化叶排序。
J Bioinform Comput Biol. 2015 Aug;13(4):1550012. doi: 10.1142/S0219720015500122. Epub 2015 Mar 2.

引用本文的文献

1
RNAi library screening reveals Gβ1, Casein Kinase 2 and ICAP-1 as novel regulators of LFA-1-mediated T cell polarity and migration.RNA干扰文库筛选揭示Gβ1、酪蛋白激酶2和ICAP-1是LFA-1介导的T细胞极性和迁移的新型调节因子。
Immunol Cell Biol. 2025 Jan;103(1):73-92. doi: 10.1111/imcb.12838. Epub 2024 Nov 28.
2
Investigation of the Genes Involved in the Outbreaks of and spp. in the United States.美国与 和 物种爆发相关基因的调查。 (你提供的原文中“ and spp.”部分信息不完整,这里只能按原样翻译)
Antibiotics (Basel). 2021 Oct 19;10(10):1274. doi: 10.3390/antibiotics10101274.
3
Comparison of Antimicrobial Resistance Detected in Environmental and Clinical Isolates from Historical Data for the US.
美国历史数据中环境和临床分离株的耐药性比较。
Biomed Res Int. 2020 Apr 11;2020:4254530. doi: 10.1155/2020/4254530. eCollection 2020.
4
Investigation of Incidents and Trends of Antimicrobial Resistance in Foodborne Pathogens in Eight Countries from Historical Sample Data.从历史样本数据调查 8 个国家食源致病菌的耐药性事件和趋势。
Int J Environ Res Public Health. 2020 Jan 10;17(2):472. doi: 10.3390/ijerph17020472.
5
Data-Driven Analysis of Antimicrobial Resistance in Foodborne Pathogens from Six States within the US.基于美国六州食源性病原体的抗菌药物耐药性数据驱动分析。
Int J Environ Res Public Health. 2019 May 22;16(10):1811. doi: 10.3390/ijerph16101811.
6
Characterizing ABC-Transporter Substrate-Likeness Using a Clean-Slate Genetic Background.利用全新的遗传背景表征ABC转运蛋白底物相似性
Front Pharmacol. 2019 Apr 25;10:448. doi: 10.3389/fphar.2019.00448. eCollection 2019.
7
Rapid resistome fingerprinting and clonal lineage profiling of carbapenem-resistant Klebsiella pneumoniae isolates by targeted next-generation sequencing.通过靶向新一代测序对耐碳青霉烯类肺炎克雷伯菌分离株进行快速耐药基因组指纹图谱分析和克隆谱系分析。
J Clin Microbiol. 2014 Mar;52(3):987-90. doi: 10.1128/JCM.03247-13. Epub 2014 Jan 8.
8
An improved hypergeometric probability method for identification of functionally linked proteins using phylogenetic profiles.一种利用系统发育谱鉴定功能关联蛋白的改进超几何概率方法。
Bioinformation. 2013 Apr 13;9(7):368-74. doi: 10.6026/97320630009368. Print 2013.
9
Ligand-dependent dynamics of retinoic acid receptor binding during early neurogenesis.早期神经发生过程中视黄酸受体结合的配体依赖性动力学。
Genome Biol. 2011;12(1):R2. doi: 10.1186/gb-2011-12-1-r2. Epub 2011 Jan 13.
10
Gibberellins regulate lateral root formation in Populus through interactions with auxin and other hormones.赤霉素通过与生长素和其他激素的相互作用调节杨树的侧根形成。
Plant Cell. 2010 Mar;22(3):623-39. doi: 10.1105/tpc.109.073239. Epub 2010 Mar 30.