• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因表达数据中任意定义组的轮廓系数及对差异表达结果的见解。

Silhouette Scores for Arbitrary Defined Groups in Gene Expression Data and Insights into Differential Expression Results.

作者信息

Zhao Shitao, Sun Jianqiang, Shimizu Kentaro, Kadota Koji

机构信息

Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657 Japan.

出版信息

Biol Proced Online. 2018 Mar 1;20:5. doi: 10.1186/s12575-018-0067-8. eCollection 2018.

DOI:10.1186/s12575-018-0067-8
PMID:29507534
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5831220/
Abstract

BACKGROUND

Hierarchical Sample clustering (HSC) is widely performed to examine associations within expression data obtained from microarrays and RNA sequencing (RNA-seq). Researchers have investigated the HSC results with several possible criteria for grouping (e.g., sex, age, and disease types). However, the evaluation of arbitrary defined groups still counts in subjective visual inspection.

RESULTS

To objectively evaluate the degree of separation between groups of interest in the HSC dendrogram, we propose to use scores. Silhouettes was originally developed as a graphical aid for the validation of data clusters. It provides a measure of how well a sample is classified when it was assigned to a cluster by according to both the tightness of the clusters and the separation between them. It ranges from 1.0 to - 1.0, and a larger value for the average silhouette (AS) over all samples to be analyzed indicates a higher degree of separation. The basic idea to use an AS is to replace the term by when calculating the scores. We investigated the validity of this score using simulated and real data designed for differential expression (DE) analysis. We found that larger (or smaller) AS values agreed well with both higher (or lower) degrees of separation between different groups and higher percentages of differentially expressed genes (). We also found that the AS values were generally independent on the number of replicates (). Although the values depended on , we confirmed that both AS and values were close to zero when samples in the data showed an intermingled nature between the groups in the HSC dendrogram.

CONCLUSION

Silhouettes is useful for exploring data with predefined group labels. It would help provide both an objective evaluation of HSC dendrograms and insights into the DE results with regard to the compared groups.

摘要

背景

层次样本聚类(HSC)被广泛用于检查从微阵列和RNA测序(RNA-seq)获得的表达数据中的关联。研究人员已经用几种可能的分组标准(如性别、年龄和疾病类型)来研究HSC结果。然而,对任意定义组的评估仍然依赖主观的视觉检查。

结果

为了客观评估HSC树状图中感兴趣组之间的分离程度,我们建议使用轮廓系数。轮廓系数最初是作为一种用于验证数据聚类的图形辅助工具而开发的。它提供了一种衡量样本在根据聚类的紧密程度和它们之间的分离程度被分配到一个聚类时被分类得有多好的方法。它的范围从1.0到-1.0,并且对于所有要分析的样本,平均轮廓系数(AS)的值越大表明分离程度越高。使用AS的基本思想是在计算轮廓系数时用AS代替轮廓系数。我们使用为差异表达(DE)分析设计的模拟数据和真实数据研究了这个分数的有效性。我们发现较大(或较小)的AS值与不同组之间较高(或较低)的分离程度以及较高(或较低)的差异表达基因百分比(DEGs)都很好地吻合。我们还发现AS值通常与重复次数(n)无关。尽管DEG值依赖于n,但我们证实当数据中的样本在HSC树状图中显示出组间混合的性质时,AS和DEG值都接近于零。

结论

轮廓系数对于探索带有预定义组标签的数据很有用。它将有助于对HSC树状图进行客观评估,并深入了解关于比较组的DE结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14f2/5831220/d16f927e6239/12575_2018_67_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14f2/5831220/e791b975c862/12575_2018_67_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14f2/5831220/abd397a7a597/12575_2018_67_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14f2/5831220/d16f927e6239/12575_2018_67_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14f2/5831220/e791b975c862/12575_2018_67_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14f2/5831220/abd397a7a597/12575_2018_67_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14f2/5831220/d16f927e6239/12575_2018_67_Fig3_HTML.jpg

相似文献

1
Silhouette Scores for Arbitrary Defined Groups in Gene Expression Data and Insights into Differential Expression Results.基因表达数据中任意定义组的轮廓系数及对差异表达结果的见解。
Biol Proced Online. 2018 Mar 1;20:5. doi: 10.1186/s12575-018-0067-8. eCollection 2018.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Differential expression analysis using a model-based gene clustering algorithm for RNA-seq data.基于模型的基因聚类算法在 RNA-seq 数据中的差异表达分析。
BMC Bioinformatics. 2021 Oct 20;22(1):511. doi: 10.1186/s12859-021-04438-4.
4
How frequently do clusters occur in hierarchical clustering analysis? A graph theoretical approach to studying ties in proximity.在层次聚类分析中,簇出现的频率如何?一种研究邻近关系中联系的图论方法。
J Cheminform. 2016 Jan 25;8:4. doi: 10.1186/s13321-016-0114-x. eCollection 2016.
5
Silhouette width using generalized mean-A flexible method for assessing clustering efficiency.使用广义均值的轮廓宽度——一种评估聚类效率的灵活方法。
Ecol Evol. 2019 Nov 19;9(23):13231-13243. doi: 10.1002/ece3.5774. eCollection 2019 Dec.
6
SAIC: an iterative clustering approach for analysis of single cell RNA-seq data.SAIC:一种用于分析单细胞 RNA-seq 数据的迭代聚类方法。
BMC Genomics. 2017 Oct 3;18(Suppl 6):689. doi: 10.1186/s12864-017-4019-5.
7
Silhouette scores for assessment of SNP genotype clusters.用于评估单核苷酸多态性(SNP)基因型簇的轮廓系数
BMC Genomics. 2005 Mar 10;6:35. doi: 10.1186/1471-2164-6-35.
8
Sheep's coping style can be identified by unsupervised machine learning from unlabeled data.通过对无标签数据进行无监督机器学习,可以识别出绵羊的应对方式。
Behav Processes. 2022 Jan;194:104559. doi: 10.1016/j.beproc.2021.104559. Epub 2021 Nov 25.
9
A clustering procedure for three-way RNA sequencing data using data transformations and matrix-variate Gaussian mixture models.基于数据变换和矩阵变量高斯混合模型的三方 RNA 测序数据聚类方法。
BMC Bioinformatics. 2024 Mar 1;25(1):90. doi: 10.1186/s12859-024-05717-6.
10
A Novel Cluster Validity Index Based on Local Cores.一种基于局部核心的新型聚类有效性指标。
IEEE Trans Neural Netw Learn Syst. 2019 Apr;30(4):985-999. doi: 10.1109/TNNLS.2018.2853710. Epub 2018 Aug 2.

引用本文的文献

1
Subtypes of tic disorders in children and adolescents: based on clinical characteristics.儿童和青少年抽动障碍的亚型:基于临床特征
BMC Pediatr. 2025 May 2;25(1):349. doi: 10.1186/s12887-025-05698-2.
2
Progress in Assessing Retinal Microglia Using Single-Cell RNA Sequencing.利用单细胞RNA测序评估视网膜小胶质细胞的研究进展
Adv Exp Med Biol. 2025;1468:143-147. doi: 10.1007/978-3-031-76550-6_24.
3
A review of model evaluation metrics for machine learning in genetics and genomics.遗传学和基因组学中机器学习模型评估指标综述。

本文引用的文献

1
In Papyro Comparison of TMM (edgeR), RLE (DESeq2), and MRN Normalization Methods for a Simple Two-Conditions-Without-Replicates RNA-Seq Experimental Design.在简单的无重复双条件RNA测序实验设计中TMM(edgeR)、RLE(DESeq2)和MRN标准化方法的纸莎草比较
Front Genet. 2016 Sep 16;7:164. doi: 10.3389/fgene.2016.00164. eCollection 2016.
2
Statistical inference for time course RNA-Seq data using a negative binomial mixed-effect model.使用负二项混合效应模型对时间进程RNA测序数据进行统计推断。
BMC Bioinformatics. 2016 Aug 26;17(1):324. doi: 10.1186/s12859-016-1180-9.
3
Pattern Genes Suggest Functional Connectivity of Organs.
Front Bioinform. 2024 Sep 10;4:1457619. doi: 10.3389/fbinf.2024.1457619. eCollection 2024.
4
Optimizing hybrid ensemble feature selection strategies for transcriptomic biomarker discovery in complex diseases.优化混合集成特征选择策略以发现复杂疾病中的转录组生物标志物。
NAR Genom Bioinform. 2024 Jul 11;6(3):lqae079. doi: 10.1093/nargab/lqae079. eCollection 2024 Sep.
5
ST-CellSeg: Cell segmentation for imaging-based spatial transcriptomics using multi-scale manifold learning.ST-CellSeg:基于多尺度流形学习的成像空间转录组学细胞分割。
PLoS Comput Biol. 2024 Jun 27;20(6):e1012254. doi: 10.1371/journal.pcbi.1012254. eCollection 2024 Jun.
6
The transcriptomic expression pattern of immune checkpoints shows heterogeneity between and within cancer types.免疫检查点的转录组表达模式在不同癌症类型之间以及同一癌症类型内部均表现出异质性。
Am J Cancer Res. 2024 May 15;14(5):2240-2252. doi: 10.62347/JRJP7877. eCollection 2024.
7
Heterogeneous gene expression during early arteriovenous fistula remodeling suggests that downregulation of metabolism predicts adaptive venous remodeling.早期动静脉瘘重塑过程中的基因表达异质性表明,代谢下调预示着适应性静脉重塑。
Sci Rep. 2024 Jun 10;14(1):13287. doi: 10.1038/s41598-024-64075-8.
8
Enhanced clustering-based differential expression analysis method for RNA-seq data.用于RNA测序数据的基于增强聚类的差异表达分析方法
MethodsX. 2023 Dec 12;12:102518. doi: 10.1016/j.mex.2023.102518. eCollection 2024 Jun.
9
Machine Learning of Functional Connectivity to Biotype Alcohol and Nicotine Use Disorders.生物型酒精和尼古丁使用障碍功能连接的机器学习
Biol Psychiatry Cogn Neurosci Neuroimaging. 2024 Mar;9(3):326-336. doi: 10.1016/j.bpsc.2023.08.010. Epub 2023 Sep 9.
10
Minor intron splicing is critical for survival of lethal prostate cancer.小内含子剪接对致命前列腺癌的存活至关重要。
Mol Cell. 2023 Jun 15;83(12):1983-2002.e11. doi: 10.1016/j.molcel.2023.05.017. Epub 2023 Jun 8.
模式基因提示器官的功能连接性。
Sci Rep. 2016 May 26;6:26501. doi: 10.1038/srep26501.
4
How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?RNA测序实验需要多少生物学重复,以及应该使用哪种差异表达工具?
RNA. 2016 Jun;22(6):839-51. doi: 10.1261/rna.053959.115. Epub 2016 Mar 28.
5
Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster.使用来自726只黑腹果蝇个体的RNA测序数据进行标准化和差异表达分析的比较。
BMC Genomics. 2016 Jan 5;17:28. doi: 10.1186/s12864-015-2353-z.
6
Evaluation of methods for differential expression analysis on multi-group RNA-seq count data.多组RNA测序计数数据差异表达分析方法的评估
BMC Bioinformatics. 2015 Nov 4;16:361. doi: 10.1186/s12859-015-0794-7.
7
Generalized empirical Bayesian methods for discovery of differential data in high-throughput biology.用于高通量生物学中差异数据发现的广义经验贝叶斯方法。
Bioinformatics. 2016 Jan 15;32(2):195-202. doi: 10.1093/bioinformatics/btv569. Epub 2015 Oct 1.
8
Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment.源自双条件48次重复实验的RNA测序数据的统计模型。
Bioinformatics. 2015 Nov 15;31(22):3625-30. doi: 10.1093/bioinformatics/btv425. Epub 2015 Jul 23.
9
Assessing Dissimilarity Measures for Sample-Based Hierarchical Clustering of RNA Sequencing Data Using Plasmode Datasets.使用模拟数据集评估基于样本的RNA测序数据层次聚类的差异度量
PLoS One. 2015 Jul 10;10(7):e0132310. doi: 10.1371/journal.pone.0132310. eCollection 2015.
10
Classification of bioinformatics workflows using weighted versions of partitioning and hierarchical clustering algorithms.使用分区和层次聚类算法的加权版本对生物信息学工作流程进行分类。
BMC Bioinformatics. 2015 Mar 3;16:68. doi: 10.1186/s12859-015-0508-1.