• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

带噪声集的动态加权聚类。

Dynamically weighted clustering with noise set.

机构信息

Department of Statistics at University of California, Los Angeles, CA 90095, USA.

出版信息

Bioinformatics. 2010 Feb 1;26(3):341-7. doi: 10.1093/bioinformatics/btp671. Epub 2009 Dec 9.

DOI:10.1093/bioinformatics/btp671
PMID:20007256
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2815660/
Abstract

MOTIVATION

Various clustering methods have been applied to microarray gene expression data for identifying genes with similar expression profiles. As the biological annotation data accumulated, more and more genes have been organized into functional categories. Functionally related genes may be regulated by common cellular signals, thus likely to be co-expressed. Consequently, utilizing the rapidly increasing functional annotation resources such as Gene Ontology (GO) to improve the performance of clustering methods is of great interest. On the opposite side of clustering, there are genes that have distinct expression profiles and do not co-express with other genes. Identification of these scattered genes could enhance the performance of clustering methods.

RESULTS

We developed a new clustering algorithm, Dynamically Weighted Clustering with Noise set (DWCN), which makes use of gene annotation information and allows for a set of scattered genes, the noise set, to be left out of the main clusters. We tested the DWCN method and contrasted its results with those obtained using several common clustering techniques on a simulated dataset as well as on two public datasets: the Stanford yeast cell-cycle gene expression data, and a gene expression dataset for a group of genetically different yeast segregants.

CONCLUSION

Our method produces clusters with more consistent functional annotations and more coherent expression patterns than existing clustering techniques.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

各种聚类方法已被应用于微阵列基因表达数据,以识别具有相似表达谱的基因。随着生物注释数据的积累,越来越多的基因被组织成功能类别。功能相关的基因可能受到共同的细胞信号的调节,因此可能会共表达。因此,利用基因本体论 (GO) 等快速增长的功能注释资源来提高聚类方法的性能是非常有意义的。在聚类的对立面,有一些具有独特表达谱且不与其他基因共表达的基因。识别这些分散的基因可以提高聚类方法的性能。

结果

我们开发了一种新的聚类算法,即具有噪声集的动态加权聚类 (DWCN),该算法利用基因注释信息,并允许将一组分散的基因,即噪声集,排除在主要聚类之外。我们在模拟数据集以及两个公共数据集上测试了 DWCN 方法,并将其结果与几种常用聚类技术的结果进行了对比:斯坦福酵母细胞周期基因表达数据集,以及一组遗传上不同的酵母分离子的基因表达数据集。

结论

与现有聚类技术相比,我们的方法产生的聚类具有更一致的功能注释和更一致的表达模式。

补充信息

补充数据可在生物信息学在线获得。

相似文献

1
Dynamically weighted clustering with noise set.带噪声集的动态加权聚类。
Bioinformatics. 2010 Feb 1;26(3):341-7. doi: 10.1093/bioinformatics/btp671. Epub 2009 Dec 9.
2
Novel symmetry-based gene-gene dissimilarity measures utilizing Gene Ontology: Application in gene clustering.基于新型对称的基因-基因相异度度量方法,并利用基因本体论:在基因聚类中的应用。
Gene. 2018 Dec 30;679:341-351. doi: 10.1016/j.gene.2018.08.062. Epub 2018 Sep 2.
3
Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions.超越共表达关系:时移和反向基因表达谱的局部聚类可识别新的生物学相关相互作用。
J Mol Biol. 2001 Dec 14;314(5):1053-66. doi: 10.1006/jmbi.2000.5219.
4
Fuzzy c-means clustering with prior biological knowledge.具有先验生物学知识的模糊c均值聚类
J Biomed Inform. 2009 Feb;42(1):74-81. doi: 10.1016/j.jbi.2008.05.009. Epub 2008 May 24.
5
Knowledge-assisted recognition of cluster boundaries in gene expression data.基因表达数据中聚类边界的知识辅助识别。
Artif Intell Med. 2005 Sep-Oct;35(1-2):171-83. doi: 10.1016/j.artmed.2005.02.007.
6
Partial mixture model for tight clustering of gene expression time-course.用于基因表达时间进程紧密聚类的部分混合模型
BMC Bioinformatics. 2008 Jun 18;9:287. doi: 10.1186/1471-2105-9-287.
7
Comparisons of graph-structure clustering methods for gene expression data.基因表达数据的图结构聚类方法比较。
Acta Biochim Biophys Sin (Shanghai). 2006 Jun;38(6):379-84. doi: 10.1111/j.1745-7270.2006.00175.x.
8
Microarray data mining using landmark gene-guided clustering.使用标志性基因引导聚类的微阵列数据挖掘
BMC Bioinformatics. 2008 Feb 11;9:92. doi: 10.1186/1471-2105-9-92.
9
A systematic comparison of genome-scale clustering algorithms.基于基因组规模的聚类算法的系统比较。
BMC Bioinformatics. 2012 Jun 25;13 Suppl 10(Suppl 10):S7. doi: 10.1186/1471-2105-13-S10-S7.
10
Nearest Neighbor Networks: clustering expression data based on gene neighborhoods.最近邻网络:基于基因邻域对表达数据进行聚类。
BMC Bioinformatics. 2007 Jul 12;8:250. doi: 10.1186/1471-2105-8-250.

引用本文的文献

1
Object Weighting: A New Clustering Approach to Deal with Outliers and Cluster Overlap in Computational Biology.目标加权:一种新的聚类方法,用于处理计算生物学中的异常值和聚类重叠问题。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):633-643. doi: 10.1109/TCBB.2019.2921577. Epub 2021 Apr 8.
2
GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge.GO-PCA:一种利用先验知识探索基因表达数据的无监督方法。
PLoS One. 2015 Nov 17;10(11):e0143196. doi: 10.1371/journal.pone.0143196. eCollection 2015.
3
Use of prior knowledge for the analysis of high-throughput transcriptomics and metabolomics data.

本文引用的文献

1
Patterns of co-expression for protein complexes by size in Saccharomyces cerevisiae.酿酒酵母中按大小划分的蛋白质复合物共表达模式。
Nucleic Acids Res. 2009 Feb;37(2):526-32. doi: 10.1093/nar/gkn972. Epub 2008 Dec 4.
2
Discovering multi-level structures in bio-molecular data through the Bernstein inequality.通过伯恩斯坦不等式发现生物分子数据中的多层次结构。
BMC Bioinformatics. 2008 Mar 26;9 Suppl 2(Suppl 2):S4. doi: 10.1186/1471-2105-9-S2-S4.
3
Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R.
利用先验知识分析高通量转录组学和代谢组学数据。
BMC Syst Biol. 2014;8 Suppl 2(Suppl 2):S2. doi: 10.1186/1752-0509-8-S2-S2. Epub 2014 Mar 13.
4
Principles and methods of integrative genomic analyses in cancer.癌症综合基因组分析的原则和方法。
Nat Rev Cancer. 2014 May;14(5):299-313. doi: 10.1038/nrc3721.
5
CLAG: an unsupervised non hierarchical clustering algorithm handling biological data.CLAG:一种用于处理生物数据的无监督非层次聚类算法。
BMC Bioinformatics. 2012 Aug 8;13:194. doi: 10.1186/1471-2105-13-194.
从层次聚类树定义聚类:用于R的动态树切割软件包。
Bioinformatics. 2008 Mar 1;24(5):719-20. doi: 10.1093/bioinformatics/btm563. Epub 2007 Nov 16.
4
Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data.用于高通量生物数据中具有分散对象和先验信息的聚类的惩罚加权K均值算法
Bioinformatics. 2007 Sep 1;23(17):2247-55. doi: 10.1093/bioinformatics/btm320. Epub 2007 Jun 27.
5
Model order selection for bio-molecular data clustering.生物分子数据聚类的模型阶次选择
BMC Bioinformatics. 2007 May 3;8 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2105-8-S2-S7.
6
Evaluation and comparison of gene clustering methods in microarray analysis.微阵列分析中基因聚类方法的评估与比较
Bioinformatics. 2006 Oct 1;22(19):2405-12. doi: 10.1093/bioinformatics/btl406. Epub 2006 Jul 31.
7
Application of simulated annealing to the biclustering of gene expression data.模拟退火算法在基因表达数据双聚类中的应用。
IEEE Trans Inf Technol Biomed. 2006 Jul;10(3):519-25. doi: 10.1109/titb.2006.872073.
8
Incorporating gene functions as priors in model-based clustering of microarray gene expression data.在基于模型的微阵列基因表达数据聚类中纳入基因功能作为先验信息。
Bioinformatics. 2006 Apr 1;22(7):795-801. doi: 10.1093/bioinformatics/btl011. Epub 2006 Jan 24.
9
Genetic interactions between polymorphisms that affect gene expression in yeast.影响酵母基因表达的多态性之间的遗传相互作用。
Nature. 2005 Aug 4;436(7051):701-3. doi: 10.1038/nature03865.
10
Tight clustering: a resampling-based approach for identifying stable and tight patterns in data.紧密聚类:一种基于重采样的方法,用于识别数据中的稳定且紧密的模式。
Biometrics. 2005 Mar;61(1):10-6. doi: 10.1111/j.0006-341X.2005.031032.x.