Suppr超能文献

基于行列结构的基因表达数据的双聚类分析。

Row and Column Structure-Based Biclustering for Gene Expression Data.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2022 Mar-Apr;19(2):1117-1129. doi: 10.1109/TCBB.2020.3022085. Epub 2022 Apr 1.

Abstract

Due to the development of high-throughput technologies for gene analysis, the biclustering method has attracted much attention. However, existing methods have problems with high time and space complexity. This paper proposes a biclustering method, called Row and Column Structure-based Biclustering (RCSBC), with low time and space complexity to find checkerboard patterns within microarray data. First, the paper describes the structure of bicluster by using the structure of rows and columns. Second, the paper chooses the representative rows and columns with two algorithms. Finally, the gene expression data are biclustered on the space spanned by representative rows and columns. To the best of our knowledge, this paper is the first to exploit the relationship between the row/column structure of a gene expression matrix and the structure of biclusters. Both the synthetic datasets and the real-life gene expression datasets are used to validate the effectiveness of our method. It can be seen from the experiment results that the RCSBC outperforms the state-of-the-art algorithms both on clustering accuracy and time/space complexity. This study offers new insights into biclustering the large-scale gene expression data without loading the whole data into memory.

摘要

由于高通量基因分析技术的发展,分块聚类方法引起了广泛关注。然而,现有方法存在时间和空间复杂度高的问题。本文提出了一种分块聚类方法,称为基于行列结构的分块聚类(RCSBC),具有低时间和空间复杂度,可在微阵列数据中找到棋盘模式。首先,本文通过使用行列结构来描述分块聚类的结构。其次,本文选择了两种算法的代表性行和列。最后,在由代表性行和列所张成的空间上对基因表达数据进行分块聚类。据我们所知,本文首次利用基因表达矩阵的行列结构与分块聚类结构之间的关系。本文使用合成数据集和真实基因表达数据集来验证我们方法的有效性。从实验结果可以看出,RCSBC 在聚类准确性和时间/空间复杂度方面均优于最先进的算法。这项研究为在不将整个数据加载到内存的情况下对大规模基因表达数据进行分块聚类提供了新的思路。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验