• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于分类有序数据的新型专门单链聚类算法。

A novel specialized single-linkage clustering algorithm for taxonomically ordered data.

作者信息

Schmidt Markus, Kutzner Arne, Heese Klaus

机构信息

Department of Computer Science, Friedrich-Alexander University, Martensstr. 3, Erlangen-Nürnberg, Germany.

Department of Information Systems, College of Engineering, Hanyang University, 222 Wangsimni-ro, Seongdong-gu, Seoul 133-791, Republic of Korea.

出版信息

J Theor Biol. 2017 Aug 1;427:1-7. doi: 10.1016/j.jtbi.2017.05.008. Epub 2017 May 15.

DOI:10.1016/j.jtbi.2017.05.008
PMID:28522359
Abstract

Similarities among ortholog genes for a given set of species S can be expressed by alignment matrices, where each matrix cell results from aligning a gene transcript against the genome of a species within S. Gene clusters can be computed by using single-linkage clustering in time n × m, where n denotes the number of ortholog genes and m denotes the number of inspected assemblies. Our approach can break the O(n × m) complexity of single-linkage clustering by exploiting an order among species that results from an in-order traversal of a given phylogenetic tree. The order among species allows the reduction of the inspected scope of the matrix to taxonomically related combinations of assemblies and genes, thus lowering the computational efforts necessary for creating the alignment matrix without affecting cluster quality. We present two novel approaches for clustering. First, we introduce a hierarchical clustering with, omitting the initial sorting of |S| elements, amortized O(|S|) time behavior, where it holds |S|≤n+m. Then, we propose a consecutive clustering having a linear time complexity O(|S|). Both approaches compute identical clusters, whereas dendrograms can only be obtained from the hierarchical one. We prove that our approaches deliver higher cluster densities than single linkage clustering. Additionally, we show that we compute clusters of superior quality, which ensures that our approaches are generally less error prone.

摘要

给定物种集S的直系同源基因之间的相似性可以通过比对矩阵来表示,其中每个矩阵单元是通过将一个基因转录本与S内一个物种的基因组进行比对得到的。基因簇可以通过使用单链聚类在时间n×m内计算得出,其中n表示直系同源基因的数量,m表示检查的组装体数量。我们的方法可以通过利用给定系统发育树的中序遍历所产生的物种顺序来打破单链聚类的O(n×m)复杂度。物种顺序允许将矩阵的检查范围缩小到分类学上相关的组装体和基因组合,从而在不影响聚类质量的情况下降低创建比对矩阵所需的计算量。我们提出了两种新颖的聚类方法。首先,我们引入一种层次聚类,省略|S|个元素的初始排序,具有分摊的O(|S|)时间复杂度,其中|S|≤n+m。然后,我们提出一种具有线性时间复杂度O(|S|)的连续聚类。两种方法计算出的聚类相同,而树状图只能从层次聚类中获得。我们证明我们的方法比单链聚类具有更高的聚类密度。此外,我们表明我们计算出的聚类质量更高,这确保了我们的方法通常更不易出错。

相似文献

1
A novel specialized single-linkage clustering algorithm for taxonomically ordered data.一种用于分类有序数据的新型专门单链聚类算法。
J Theor Biol. 2017 Aug 1;427:1-7. doi: 10.1016/j.jtbi.2017.05.008. Epub 2017 May 15.
2
Clustering Molecular Dynamics Trajectories: 1. Characterizing the Performance of Different Clustering Algorithms.聚类分子动力学轨迹:1. 表征不同聚类算法的性能
J Chem Theory Comput. 2007 Nov;3(6):2312-34. doi: 10.1021/ct700119m.
3
Knowledge-assisted recognition of cluster boundaries in gene expression data.基因表达数据中聚类边界的知识辅助识别。
Artif Intell Med. 2005 Sep-Oct;35(1-2):171-83. doi: 10.1016/j.artmed.2005.02.007.
4
A cross-species bi-clustering approach to identifying conserved co-regulated genes.一种用于识别保守共调控基因的跨物种双聚类方法。
Bioinformatics. 2016 Jun 15;32(12):i137-i146. doi: 10.1093/bioinformatics/btw278.
5
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
6
Ortholog clustering on a multipartite graph.在多重图上的直系同源聚类
IEEE/ACM Trans Comput Biol Bioinform. 2007 Jan-Mar;4(1):17-27. doi: 10.1109/TCBB.2007.1004.
7
Machine-learned cluster identification in high-dimensional data.高维数据中的机器学习聚类识别
J Biomed Inform. 2017 Feb;66:95-104. doi: 10.1016/j.jbi.2016.12.011. Epub 2016 Dec 28.
8
A hybrid clustering approach to recognition of protein families in 114 microbial genomes.一种用于识别114个微生物基因组中蛋白质家族的混合聚类方法。
BMC Bioinformatics. 2004 Apr 29;5:45. doi: 10.1186/1471-2105-5-45.
9
A graph-based clustering method for a large set of sequences using a graph partitioning algorithm.一种使用图划分算法对大量序列进行基于图的聚类方法。
Genome Inform. 2001;12:93-102.
10
Clustering of gene expression data: performance and similarity analysis.基因表达数据的聚类:性能与相似性分析
BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S19. doi: 10.1186/1471-2105-7-S4-S19.

引用本文的文献

1
Advanced DNA fingerprint genotyping based on a model developed from real chip electrophoresis data.基于从真实芯片电泳数据开发的模型的先进DNA指纹基因分型。
J Adv Res. 2019 Jan 25;18:9-18. doi: 10.1016/j.jare.2019.01.005. eCollection 2019 Jul.