基于带的基因表达分类和聚类相似性指数。

Band-based similarity indices for gene expression classification and clustering.

机构信息

Departamento de Matemáticas, Instituto Gregorio Millán, Universidad Carlos III de Madrid, 28911, Leganés, Spain.

出版信息

Sci Rep. 2021 Nov 3;11(1):21609. doi: 10.1038/s41598-021-00678-9.

DOI:10.1038/s41598-021-00678-9

PMID:34732744

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8566472/

Abstract

The concept of depth induces an ordering from centre outwards in multivariate data. Most depth definitions are unfeasible for dimensions larger than three or four, but the Modified Band Depth (MBD) is a notable exception that has proven to be a valuable tool in the analysis of high-dimensional gene expression data. This depth definition relates the centrality of each individual to its (partial) inclusion in all possible bands formed by elements of the data set. We assess (dis)similarity between pairs of observations by accounting for such bands and constructing binary matrices associated to each pair. From these, contingency tables are calculated and used to derive standard similarity indices. Our approach is computationally efficient and can be applied to bands formed by any number of observations from the data set. We have evaluated the performance of several band-based similarity indices with respect to that of other classical distances in standard classification and clustering tasks in a variety of simulated and real data sets. However, the use of the method is not restricted to these, the extension to other similarity coefficients being straightforward. Our experiments show the benefits of our technique, with some of the selected indices outperforming, among others, the Euclidean distance.

摘要

深度的概念在多元数据中从中心到外围引入了一种排序。大多数深度定义在维度大于三或四时是不可行的，但修改后的带宽深度（MBD）是一个显著的例外，它已被证明是分析高维基因表达数据的有用工具。该深度定义将每个个体的中心度与其（部分）包含在由数据集元素形成的所有可能的带宽内相关联。我们通过考虑这些带宽并为每对观测值构建相关的二进制矩阵来评估观测值之间的（不）相似性。从这些矩阵中，可以计算出列联表，并用于得出标准的相似性指数。我们的方法计算效率高，可应用于从数据集中选择任意数量的观测值形成的带宽。我们已经在各种模拟和真实数据集的标准分类和聚类任务中，评估了几种基于带宽的相似性指数的性能，以及其他经典距离的性能。然而，该方法的使用并不限于这些，将其扩展到其他相似系数也很简单。我们的实验表明了我们的技术的优势，其中一些选定的指数在其他方面表现优于欧几里得距离。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于带的基因表达分类和聚类相似性指数。

Band-based similarity indices for gene expression classification and clustering.

机构信息

出版信息

相似文献

本文引用的文献

基于带的基因表达分类和聚类相似性指数。

Band-based similarity indices for gene expression classification and clustering.

机构信息

出版信息

相似文献

本文引用的文献