双聚类稀疏二元基因组数据。

Biclustering sparse binary genomic data.

作者信息

van Uitert Miranda, Meuleman Wouter, Wessels Lodewyk

机构信息

Bioinformatics and Statistics, Division of Molecular Biology, The Netherlands Cancer Institute, Amsterdam, The Netherlands.

出版信息

J Comput Biol. 2008 Dec;15(10):1329-45. doi: 10.1089/cmb.2008.0066.

DOI:10.1089/cmb.2008.0066

PMID:19040367

Abstract

Genomic datasets often consist of large, binary, sparse data matrices. In such a dataset, one is often interested in finding contiguous blocks that (mostly) contain ones. This is a biclustering problem, and while many algorithms have been proposed to deal with gene expression data, only two algorithms have been proposed that specifically deal with binary matrices. None of the gene expression biclustering algorithms can handle the large number of zeros in sparse binary matrices. The two proposed binary algorithms failed to produce meaningful results. In this article, we present a new algorithm that is able to extract biclusters from sparse, binary datasets. A powerful feature is that biclusters with different numbers of rows and columns can be detected, varying from many rows to few columns and few rows to many columns. It allows the user to guide the search towards biclusters of specific dimensions. When applying our algorithm to an input matrix derived from TRANSFAC, we find transcription factors with distinctly dissimilar binding motifs, but a clear set of common targets that are significantly enriched for GO categories.

摘要

基因组数据集通常由大型、二进制、稀疏数据矩阵组成。在这样的数据集中，人们通常感兴趣的是找到（大部分）包含“1”的连续块。这是一个双聚类问题，虽然已经提出了许多算法来处理基因表达数据，但只提出了两种专门处理二进制矩阵的算法。没有一种基因表达双聚类算法能够处理稀疏二进制矩阵中大量的“0”。所提出的两种二进制算法未能产生有意义的结果。在本文中，我们提出了一种新算法，它能够从稀疏二进制数据集中提取双聚类。一个强大的功能是可以检测具有不同行数和列数的双聚类，从多行少列到少行多列不等。它允许用户将搜索导向特定维度的双聚类。当将我们的算法应用于从TRANSFAC导出的输入矩阵时，我们发现转录因子具有明显不同的结合基序，但有一组明确的共同靶标，这些靶标在GO类别中显著富集。

相似文献

Biclustering sparse binary genomic data.双聚类稀疏二元基因组数据。

J Comput Biol. 2008 Dec;15(10):1329-45. doi: 10.1089/cmb.2008.0066.

Finding multiple coherent biclusters in microarray data using variable string length multiobjective genetic algorithm.使用可变字符串长度多目标遗传算法在微阵列数据中寻找多个相干双聚类

IEEE Trans Inf Technol Biomed. 2009 Nov;13(6):969-75. doi: 10.1109/TITB.2009.2017527. Epub 2009 Mar 16.

A biclustering algorithm for extracting bit-patterns from binary datasets.一种从二进制数据集中提取位模式的双向聚类算法。

Bioinformatics. 2011 Oct 1;27(19):2738-45. doi: 10.1093/bioinformatics/btr464. Epub 2011 Aug 8.

KMeans greedy search hybrid algorithm for biclustering gene expression data.用于基因表达数据的分聚类的 KMeans 贪婪搜索混合算法。

Adv Exp Med Biol. 2010;680:181-8. doi: 10.1007/978-1-4419-5913-3_21.

Robust biclustering by sparse singular value decomposition incorporating stability selection.基于稀疏奇异值分解和稳定性选择的稳健双聚类。

Bioinformatics. 2011 Aug 1;27(15):2089-97. doi: 10.1093/bioinformatics/btr322. Epub 2011 Jun 2.

Identification of bicluster regions in a binary matrix and its applications.二值矩阵中双聚类区域的识别及其应用。

PLoS One. 2013 Aug 5;8(8):e71680. doi: 10.1371/journal.pone.0071680. Print 2013.

Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies.系统生物学中通过数据矩阵的最优重排进行双聚类分析：严格方法与比较研究。

BMC Bioinformatics. 2008 Oct 27;9:458. doi: 10.1186/1471-2105-9-458.

A new geometric biclustering algorithm based on the Hough transform for analysis of large-scale microarray data.一种基于霍夫变换的新型几何双聚类算法，用于大规模微阵列数据分析。

J Theor Biol. 2008 Mar 21;251(2):264-74. doi: 10.1016/j.jtbi.2007.11.030. Epub 2007 Dec 4.

Parallelized evolutionary learning for detection of biclusters in gene expression data.并行进化学习在基因表达数据中的双聚类检测。

IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):560-70. doi: 10.1109/TCBB.2011.53. Epub 2011 Mar 3.

Gene expression data analysis using a novel approach to biclustering combining discrete and continuous data.使用一种结合离散数据和连续数据的新型双聚类方法进行基因表达数据分析。

IEEE/ACM Trans Comput Biol Bioinform. 2008 Oct-Dec;5(4):583-93. doi: 10.1109/TCBB.2007.70251.

引用本文的文献

Detecting significant expression patterns in single-cell and spatial transcriptomics with a flexible computational approach.使用灵活的计算方法在单细胞和空间转录组学中检测显著表达模式。

Sci Rep. 2024 Oct 30;14(1):26121. doi: 10.1038/s41598-024-75314-3.

Bayesian Double Feature Allocation for Phenotyping with Electronic Health Records.用于电子健康记录表型分析的贝叶斯双特征分配法

J Am Stat Assoc. 2020;115(532):1620-1634. doi: 10.1080/01621459.2019.1686985. Epub 2019 Dec 9.

RUBic: rapid unsupervised biclustering.RUBic：快速无监督分块聚类。

BMC Bioinformatics. 2023 Nov 16;24(1):435. doi: 10.1186/s12859-023-05534-3.

Semantic biclustering for finding local, interpretable and predictive expression patterns.语义二分聚类用于发现局部、可解释和可预测的表达模式。

BMC Genomics. 2017 Oct 16;18(Suppl 7):752. doi: 10.1186/s12864-017-4132-5.

Large-scale bioactivity analysis of the small-molecule assayed proteome.小分子检测蛋白质组的大规模生物活性分析。

PLoS One. 2017 Feb 8;12(2):e0171413. doi: 10.1371/journal.pone.0171413. eCollection 2017.

A Tabu-Search Heuristic for Deterministic Two-Mode Blockmodeling of Binary Network Matrices.一种用于二值网络矩阵确定性双模块模型构建的禁忌搜索启发式算法。

Psychometrika. 2011 Oct;76(4):612-33. doi: 10.1007/s11336-011-9221-9. Epub 2011 Jul 14.

Fastbreak: a tool for analysis and visualization of structural variations in genomic data.Fastbreak：一种用于分析和可视化基因组数据结构变异的工具。

EURASIP J Bioinform Syst Biol. 2012 Oct 9;2012(1):15. doi: 10.1186/1687-4153-2012-15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

双聚类稀疏二元基因组数据。

Biclustering sparse binary genomic data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献