• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于带的基因表达分类和聚类相似性指数。

Band-based similarity indices for gene expression classification and clustering.

机构信息

Departamento de Matemáticas, Instituto Gregorio Millán, Universidad Carlos III de Madrid, 28911, Leganés, Spain.

出版信息

Sci Rep. 2021 Nov 3;11(1):21609. doi: 10.1038/s41598-021-00678-9.

DOI:10.1038/s41598-021-00678-9
PMID:34732744
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8566472/
Abstract

The concept of depth induces an ordering from centre outwards in multivariate data. Most depth definitions are unfeasible for dimensions larger than three or four, but the Modified Band Depth (MBD) is a notable exception that has proven to be a valuable tool in the analysis of high-dimensional gene expression data. This depth definition relates the centrality of each individual to its (partial) inclusion in all possible bands formed by elements of the data set. We assess (dis)similarity between pairs of observations by accounting for such bands and constructing binary matrices associated to each pair. From these, contingency tables are calculated and used to derive standard similarity indices. Our approach is computationally efficient and can be applied to bands formed by any number of observations from the data set. We have evaluated the performance of several band-based similarity indices with respect to that of other classical distances in standard classification and clustering tasks in a variety of simulated and real data sets. However, the use of the method is not restricted to these, the extension to other similarity coefficients being straightforward. Our experiments show the benefits of our technique, with some of the selected indices outperforming, among others, the Euclidean distance.

摘要

深度的概念在多元数据中从中心到外围引入了一种排序。大多数深度定义在维度大于三或四时是不可行的,但修改后的带宽深度(MBD)是一个显著的例外,它已被证明是分析高维基因表达数据的有用工具。该深度定义将每个个体的中心度与其(部分)包含在由数据集元素形成的所有可能的带宽内相关联。我们通过考虑这些带宽并为每对观测值构建相关的二进制矩阵来评估观测值之间的(不)相似性。从这些矩阵中,可以计算出列联表,并用于得出标准的相似性指数。我们的方法计算效率高,可应用于从数据集中选择任意数量的观测值形成的带宽。我们已经在各种模拟和真实数据集的标准分类和聚类任务中,评估了几种基于带宽的相似性指数的性能,以及其他经典距离的性能。然而,该方法的使用并不限于这些,将其扩展到其他相似系数也很简单。我们的实验表明了我们的技术的优势,其中一些选定的指数在其他方面表现优于欧几里得距离。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb5a/8566472/8da2484673c7/41598_2021_678_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb5a/8566472/bedae9068fa2/41598_2021_678_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb5a/8566472/c01e3c0712a8/41598_2021_678_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb5a/8566472/c28ab1d8bc9d/41598_2021_678_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb5a/8566472/9507e989446d/41598_2021_678_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb5a/8566472/7f14a2a5f8d8/41598_2021_678_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb5a/8566472/62a56b275153/41598_2021_678_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb5a/8566472/29fea0d6e44f/41598_2021_678_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb5a/8566472/be6ad43a4703/41598_2021_678_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb5a/8566472/8da2484673c7/41598_2021_678_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb5a/8566472/bedae9068fa2/41598_2021_678_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb5a/8566472/c01e3c0712a8/41598_2021_678_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb5a/8566472/c28ab1d8bc9d/41598_2021_678_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb5a/8566472/9507e989446d/41598_2021_678_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb5a/8566472/7f14a2a5f8d8/41598_2021_678_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb5a/8566472/62a56b275153/41598_2021_678_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb5a/8566472/29fea0d6e44f/41598_2021_678_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb5a/8566472/be6ad43a4703/41598_2021_678_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb5a/8566472/8da2484673c7/41598_2021_678_Fig9_HTML.jpg

相似文献

1
Band-based similarity indices for gene expression classification and clustering.基于带的基因表达分类和聚类相似性指数。
Sci Rep. 2021 Nov 3;11(1):21609. doi: 10.1038/s41598-021-00678-9.
2
New algorithms for multi-class cancer diagnosis using tumor gene expression signatures.使用肿瘤基因表达特征进行多类别癌症诊断的新算法。
Bioinformatics. 2003 Sep 22;19(14):1800-7. doi: 10.1093/bioinformatics/btg238.
3
Simultaneous gene clustering and subset selection for sample classification via MDL.通过最小描述长度实现用于样本分类的同步基因聚类和子集选择
Bioinformatics. 2003 Jun 12;19(9):1100-9. doi: 10.1093/bioinformatics/btg039.
4
Representative distance: a new similarity measure for class discovery from gene expression data.代表性距离:一种从基因表达数据中发现类别的新相似性度量方法。
IEEE Trans Nanobioscience. 2012 Dec;11(4):341-51. doi: 10.1109/TNB.2012.2208198. Epub 2012 Aug 6.
5
Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes.基因表达数据与临床化学和病理评估的同时聚类揭示了表型原型。
BMC Syst Biol. 2007 Feb 23;1:15. doi: 10.1186/1752-0509-1-15.
6
TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection.TSG:一种用于二分类和多分类癌症分类及信息基因选择的新算法。
BMC Med Genomics. 2013;6 Suppl 1(Suppl 1):S3. doi: 10.1186/1755-8794-6-S1-S3. Epub 2013 Jan 23.
7
Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data.用于从生物分子数据中进行肿瘤聚类的混合模糊聚类集成框架。
IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):657-70. doi: 10.1109/TCBB.2013.59.
8
Cancer Subtype Recognition Based on Laplacian Rank Constrained Multiview Clustering.基于拉普拉斯秩约束多视图聚类的癌症亚型识别。
Genes (Basel). 2021 Apr 3;12(4):526. doi: 10.3390/genes12040526.
9
Importance of proximity measures in clustering of cancer and miRNA datasets: proposal of an automated framework.癌症和miRNA数据集聚类中邻近性度量的重要性:一种自动化框架的提议
Mol Biosyst. 2016 Oct 18;12(11):3478-3501. doi: 10.1039/c6mb00609d.
10
Inferential clustering approach for microarray experiments with replicated measurements.具有重复测量的微阵列实验的推断聚类方法。
IEEE/ACM Trans Comput Biol Bioinform. 2009 Oct-Dec;6(4):594-604. doi: 10.1109/TCBB.2008.106.

本文引用的文献

1
ArrayExpress update - from bulk to single-cell expression data.ArrayExpress 更新——从批量到单细胞表达数据。
Nucleic Acids Res. 2019 Jan 8;47(D1):D711-D715. doi: 10.1093/nar/gky964.
2
A comprehensive genomic pan-cancer classification using The Cancer Genome Atlas gene expression data.利用癌症基因组图谱基因表达数据进行的全面基因组泛癌分类。
BMC Genomics. 2017 Jul 3;18(1):508. doi: 10.1186/s12864-017-3906-0.
3
Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression.利用人类基因表达综合图谱鉴定癌症相关基因
PLoS One. 2016 Jun 20;11(6):e0157484. doi: 10.1371/journal.pone.0157484. eCollection 2016.
4
A new twist on a very old binary similarity coefficient.对一个非常古老的二元相似系数的新诠释。
Ecology. 2015 Feb;96(2):575-86. doi: 10.1890/14-0471.1.
5
Next-generation sequencing of RNA and DNA isolated from paired fresh-frozen and formalin-fixed paraffin-embedded samples of human cancer and normal tissue.从人类癌症和正常组织的配对新鲜冷冻及福尔马林固定石蜡包埋样本中分离出的RNA和DNA的新一代测序。
PLoS One. 2014 May 30;9(5):e98187. doi: 10.1371/journal.pone.0098187. eCollection 2014.
6
On the selection of appropriate distances for gene expression data clustering.基因表达数据聚类中适当距离的选择。
BMC Bioinformatics. 2014;15 Suppl 2(Suppl 2):S2. doi: 10.1186/1471-2105-15-S2-S2. Epub 2014 Jan 24.
7
The Cancer Genome Atlas Pan-Cancer analysis project.癌症基因组图谱泛癌分析项目。
Nat Genet. 2013 Oct;45(10):1113-20. doi: 10.1038/ng.2764.
8
DepthTools: an R package for a robust analysis of gene expression data.DepthTools:一个用于基因表达数据稳健分析的 R 包。
BMC Bioinformatics. 2013 Jul 25;14:237. doi: 10.1186/1471-2105-14-237.
9
Similarity coefficients for binary chemoinformatics data: overview and extended comparison using simulated and real data sets.二值化化学生物信息数据的相似度系数:综述及使用模拟和真实数据集的扩展比较。
J Chem Inf Model. 2012 Nov 26;52(11):2884-901. doi: 10.1021/ci300261r. Epub 2012 Nov 7.
10
Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering.微阵列分类发现中的挑战:归一化、基因选择和聚类的综合考察。
BMC Bioinformatics. 2010 Oct 11;11:503. doi: 10.1186/1471-2105-11-503.