College of Intelligence and Computing, Tianjin University.
School of Management, Shenzhen Polytechnic.
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa216.
Single-cell RNA-sequencing (scRNA-seq) data widely exist in bioinformatics. It is crucial to devise a distance metric for scRNA-seq data. Almost all existing clustering methods based on spectral clustering algorithms work in three separate steps: similarity graph construction; continuous labels learning; discretization of the learned labels by k-means clustering. However, this common practice has potential flaws that may lead to severe information loss and degradation of performance. Furthermore, the performance of a kernel method is largely determined by the selected kernel; a self-weighted multiple kernel learning model can help choose the most suitable kernel for scRNA-seq data. To this end, we propose to automatically learn similarity information from data. We present a new clustering method in the form of a multiple kernel combination that can directly discover groupings in scRNA-seq data. The main proposition is that automatically learned similarity information from scRNA-seq data is used to transform the candidate solution into a new solution that better approximates the discrete one. The proposed model can be efficiently solved by the standard support vector machine (SVM) solvers. Experiments on benchmark scRNA-Seq data validate the superior performance of the proposed model. Spectral clustering with multiple kernels is implemented in Matlab, licensed under Massachusetts Institute of Technology (MIT) and freely available from the Github website, https://github.com/Cuteu/SMSC/.
单细胞 RNA 测序 (scRNA-seq) 数据在生物信息学中广泛存在。设计 scRNA-seq 数据的距离度量标准至关重要。几乎所有基于谱聚类算法的现有聚类方法都分为三个独立的步骤:相似性图构建;连续标签学习;通过 k-means 聚类对学习到的标签进行离散化。然而,这种常见的做法存在潜在的缺陷,可能导致严重的信息丢失和性能下降。此外,核方法的性能在很大程度上取决于所选核;自加权多核学习模型可以帮助为 scRNA-seq 数据选择最合适的核。为此,我们提出从数据中自动学习相似信息。我们提出了一种以多核组合形式表示的新聚类方法,该方法可以直接在 scRNA-seq 数据中发现分组。主要观点是,从 scRNA-seq 数据中自动学习的相似信息用于将候选解转换为更好地近似离散解的新解。所提出的模型可以通过标准支持向量机 (SVM) 求解器有效地求解。基准 scRNA-Seq 数据的实验验证了所提出模型的优越性能。多个内核的谱聚类在 Matlab 中实现,该软件的许可证是麻省理工学院 (MIT) 的,并且可以从 Github 网站 https://github.com/Cuteu/SMSC/ 免费获得。