• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因表达微阵列数据聚类的接近度度量:验证方法学和比较分析。

Proximity measures for clustering gene expression microarray data: a validation methodology and a comparative analysis.

机构信息

University of São Paulo, São Carlos.

Federal University of Pernambuco, Recife and Aachen University Medical School, RWTH Aachen.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2013 Jul-Aug;10(4):845-57. doi: 10.1109/TCBB.2013.9.

DOI:10.1109/TCBB.2013.9
PMID:24334380
Abstract

Cluster analysis is usually the first step adopted to unveil information from gene expression microarray data. Besides selecting a clustering algorithm, choosing an appropriate proximity measure (similarity or distance) is of great importance to achieve satisfactory clustering results. Nevertheless, up to date, there are no comprehensive guidelines concerning how to choose proximity measures for clustering microarray data. Pearson is the most used proximity measure, whereas characteristics of other ones remain unexplored. In this paper, we investigate the choice of proximity measures for the clustering of microarray data by evaluating the performance of 16 proximity measures in 52 data sets from time course and cancer experiments. Our results support that measures rarely employed in the gene expression literature can provide better results than commonly employed ones, such as Pearson, Spearman, and euclidean distance. Given that different measures stood out for time course and cancer data evaluations, their choice should be specific to each scenario. To evaluate measures on time-course data, we preprocessed and compiled 17 data sets from the microarray literature in a benchmark along with a new methodology, called Intrinsic Biological Separation Ability (IBSA). Both can be employed in future research to assess the effectiveness of new measures for gene time-course data.

摘要

聚类分析通常是揭示基因表达微阵列数据信息的第一步。除了选择聚类算法外,选择适当的相似度度量(相似性或距离)对于获得令人满意的聚类结果非常重要。然而,到目前为止,还没有关于如何为微阵列数据聚类选择相似度度量的综合指南。皮尔逊是最常用的相似度度量,而其他度量的特性仍未被探索。在本文中,我们通过评估 52 个来自时间序列和癌症实验的数据集中的 16 种相似度度量的性能,研究了微阵列数据聚类中相似度度量的选择。我们的结果支持这样一种观点,即在基因表达文献中很少使用的度量标准可以提供比常用的度量标准(如皮尔逊、斯皮尔曼和欧几里得距离)更好的结果。由于不同的度量标准在时间序列和癌症数据评估中表现突出,因此应根据具体情况选择它们。为了评估时间序列数据的度量标准,我们预处理并编译了微阵列文献中的 17 个数据集,以及一种名为内在生物学分离能力(IBSA)的新方法,作为基准。两者都可以在未来的研究中用于评估新的基因时间序列数据度量标准的有效性。

相似文献

1
Proximity measures for clustering gene expression microarray data: a validation methodology and a comparative analysis.基因表达微阵列数据聚类的接近度度量:验证方法学和比较分析。
IEEE/ACM Trans Comput Biol Bioinform. 2013 Jul-Aug;10(4):845-57. doi: 10.1109/TCBB.2013.9.
2
Importance of proximity measures in clustering of cancer and miRNA datasets: proposal of an automated framework.癌症和miRNA数据集聚类中邻近性度量的重要性:一种自动化框架的提议
Mol Biosyst. 2016 Oct 18;12(11):3478-3501. doi: 10.1039/c6mb00609d.
3
Evaluation of clustering algorithms for gene expression data.基因表达数据聚类算法的评估
BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S17. doi: 10.1186/1471-2105-7-S4-S17.
4
Gene expression data clustering using a multiobjective symmetry based clustering technique.基于多目标对称的基因表达数据聚类技术。
Comput Biol Med. 2013 Nov;43(11):1965-77. doi: 10.1016/j.compbiomed.2013.07.021. Epub 2013 Sep 7.
5
Computational cluster validation for microarray data analysis: experimental assessment of Clest, Consensus Clustering, Figure of Merit, Gap Statistics and Model Explorer.用于微阵列数据分析的计算聚类验证:Clest、共识聚类、品质因数、间隙统计和模型探索器的实验评估。
BMC Bioinformatics. 2008 Oct 29;9:462. doi: 10.1186/1471-2105-9-462.
6
Inferential clustering approach for microarray experiments with replicated measurements.具有重复测量的微阵列实验的推断聚类方法。
IEEE/ACM Trans Comput Biol Bioinform. 2009 Oct-Dec;6(4):594-604. doi: 10.1109/TCBB.2008.106.
7
Metric for measuring the effectiveness of clustering of DNA microarray expression.用于测量 DNA 微阵列表达聚类有效性的度量。
BMC Bioinformatics. 2006 Sep 6;7 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-7-S2-S5.
8
Clustering of gene expression data: performance and similarity analysis.基因表达数据的聚类:性能与相似性分析
BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S19. doi: 10.1186/1471-2105-7-S4-S19.
9
Novel symmetry-based gene-gene dissimilarity measures utilizing Gene Ontology: Application in gene clustering.基于新型对称的基因-基因相异度度量方法,并利用基因本体论:在基因聚类中的应用。
Gene. 2018 Dec 30;679:341-351. doi: 10.1016/j.gene.2018.08.062. Epub 2018 Sep 2.
10
Nonlinear dimensionality reduction of gene expression data for visualization and clustering analysis of cancer tissue samples.基因表达数据的非线性维数降低,用于癌症组织样本的可视化和聚类分析。
Comput Biol Med. 2010 Aug;40(8):723-32. doi: 10.1016/j.compbiomed.2010.06.007. Epub 2010 Jul 16.

引用本文的文献

1
The miRNome of canine invasive urothelial carcinoma.犬浸润性尿路上皮癌的微小RNA组
Front Vet Sci. 2022 Aug 22;9:945638. doi: 10.3389/fvets.2022.945638. eCollection 2022.
2
Heterogeneous data integration methods for patient similarity networks.用于患者相似网络的异质数据集成方法。
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac207.
3
An information-theoretic approach for measuring the distance of organ tissue samples using their transcriptomic signatures.基于转录组特征的器官组织样本距离测量的信息论方法。
Bioinformatics. 2021 Jan 29;36(21):5194-5204. doi: 10.1093/bioinformatics/btaa654.
4
Deep learning-based clustering approaches for bioinformatics.基于深度学习的生物信息学聚类方法。
Brief Bioinform. 2021 Jan 18;22(1):393-415. doi: 10.1093/bib/bbz170.
5
Identifying gene-specific subgroups: an alternative to biclustering.鉴定基因特异性亚组:一种替代双聚类的方法。
BMC Bioinformatics. 2019 Dec 3;20(1):625. doi: 10.1186/s12859-019-3289-0.
6
Pairwise gene GO-based measures for biclustering of high-dimensional expression data.基于成对基因GO的高维表达数据双聚类方法
BioData Min. 2018 Mar 27;11:4. doi: 10.1186/s13040-018-0165-9. eCollection 2018.
7
A systematic comparative evaluation of biclustering techniques.双聚类技术的系统比较评估
BMC Bioinformatics. 2017 Jan 23;18(1):55. doi: 10.1186/s12859-017-1487-1.
8
Pathobiochemical signatures of cholestatic liver disease in bile duct ligated mice.胆管结扎小鼠胆汁淤积性肝病的病理生化特征
BMC Syst Biol. 2015 Nov 20;9:83. doi: 10.1186/s12918-015-0229-0.
9
An overview of bioinformatics methods for modeling biological pathways in yeast.用于酵母生物途径建模的生物信息学方法综述。
Brief Funct Genomics. 2016 Mar;15(2):95-108. doi: 10.1093/bfgp/elv040. Epub 2015 Oct 17.
10
On the selection of appropriate distances for gene expression data clustering.基因表达数据聚类中适当距离的选择。
BMC Bioinformatics. 2014;15 Suppl 2(Suppl 2):S2. doi: 10.1186/1471-2105-15-S2-S2. Epub 2014 Jan 24.