频繁模式的期望最大化算法，一种针对生物数据集的特定的、局部的、基于模式的双聚类算法。

Expectation Maximization of Frequent Patterns, a Specific, Local, Pattern-Based Biclustering Algorithm for Biological Datasets.

作者信息

Moore Erin Jessica, Bourlai Thirmachos

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2016 Sep-Oct;13(5):812-824. doi: 10.1109/TCBB.2015.2510011. Epub 2015 Dec 17.

DOI:10.1109/TCBB.2015.2510011

Abstract

Currently, binary biclustering algorithms are too slow and non-specific to handle biological datasets that have a large number of attributes, which is essential for the computational biology problem of microarray analysis. Specialized computers may be needed to execute an algorithm, and may fail to produce a solution, due to its large resource needs. The biclusters also include too many false positives, the type I error, which hinders biological discovery. We propose an algorithm that can analyze datasets with a large attribute set at different densities, and can operate on a laptop, which makes it accessible to practitioners. EMFP produces biclusters that have a very low Root Mean Squared Error and false positive rate, with very few type II errors. Our binary biclustering algorithm is a hybrid, axis-parallel, pattern-based algorithm that finds multiple, non-overlapping, near-constant, deterministic, binary submatricies, with a variable confidence threshold, and the novel use of local density comparisons versus the standard global threshold. EMFP introduces a new, and intuitive way to calculate internal measures for binary biclustering methods. We also introduce a framework to ease comparison with other algorithms, and compare to both binary and general biclustering algorithms using two real, and 80 synthetic databases.

摘要

目前，二元双聚类算法处理具有大量属性的生物数据集时速度过慢且缺乏特异性，而这些属性对于微阵列分析的计算生物学问题至关重要。执行算法可能需要专门的计算机，并且由于其对资源需求巨大，可能无法得出解决方案。双聚类还包含过多的假阳性（I型错误），这阻碍了生物学发现。我们提出了一种算法，它能够分析具有不同密度的大属性集数据集，并且可以在笔记本电脑上运行，从而便于从业者使用。EMFP生成的双聚类具有非常低的均方根误差和假阳性率，II型错误也很少。我们的二元双聚类算法是一种混合的、轴平行的、基于模式的算法，它能找到多个不重叠、近似恒定、确定性的二元子矩阵，具有可变的置信阈值，并创新性地使用局部密度比较而非标准全局阈值。EMFP引入了一种全新且直观的方法来计算二元双聚类方法的内部度量。我们还引入了一个便于与其他算法进行比较的框架，并使用两个真实和80个合成数据库与二元和通用双聚类算法进行比较。

相似文献

Expectation Maximization of Frequent Patterns, a Specific, Local, Pattern-Based Biclustering Algorithm for Biological Datasets.频繁模式的期望最大化算法，一种针对生物数据集的特定的、局部的、基于模式的双聚类算法。

IEEE/ACM Trans Comput Biol Bioinform. 2016 Sep-Oct;13(5):812-824. doi: 10.1109/TCBB.2015.2510011. Epub 2015 Dec 17.

A biclustering algorithm for extracting bit-patterns from binary datasets.一种从二进制数据集中提取位模式的双向聚类算法。

Bioinformatics. 2011 Oct 1;27(19):2738-45. doi: 10.1093/bioinformatics/btr464. Epub 2011 Aug 8.

Discovery of error-tolerant biclusters from noisy gene expression data.从嘈杂的基因表达数据中发现容错双聚类。

BMC Bioinformatics. 2011 Nov 24;12 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-12-S12-S1.

Biclustering with Flexible Plaid Models to Unravel Interactions between Biological Processes.使用灵活格子模型的双聚类分析以揭示生物过程之间的相互作用

IEEE/ACM Trans Comput Biol Bioinform. 2015 Jul-Aug;12(4):738-52. doi: 10.1109/TCBB.2014.2388206.

A systematic comparison and evaluation of biclustering methods for gene expression data.基因表达数据双聚类方法的系统比较与评估

Bioinformatics. 2006 May 1;22(9):1122-9. doi: 10.1093/bioinformatics/btl060. Epub 2006 Feb 24.

Gene expression data analysis using a novel approach to biclustering combining discrete and continuous data.使用一种结合离散数据和连续数据的新型双聚类方法进行基因表达数据分析。

IEEE/ACM Trans Comput Biol Bioinform. 2008 Oct-Dec;5(4):583-93. doi: 10.1109/TCBB.2007.70251.

Discovering biclusters in gene expression data based on high-dimensional linear geometries.基于高维线性几何在基因表达数据中发现双簇。

BMC Bioinformatics. 2008 Apr 23;9:209. doi: 10.1186/1471-2105-9-209.

Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization.使用高效双聚类算法和并行坐标可视化技术识别基因表达数据中的连贯模式。

BMC Bioinformatics. 2008 Apr 23;9:210. doi: 10.1186/1471-2105-9-210.

Parallelized evolutionary learning for detection of biclusters in gene expression data.并行进化学习在基因表达数据中的双聚类检测。

IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):560-70. doi: 10.1109/TCBB.2011.53. Epub 2011 Mar 3.

A New Binary Biclustering Algorithm Based on Weight Adjacency Difference Matrix for Analyzing Gene Expression Data.基于权重邻接差矩阵的新型二元分簇算法在基因表达数据分析中的应用。

IEEE/ACM Trans Comput Biol Bioinform. 2023 Sep-Oct;20(5):2802-2809. doi: 10.1109/TCBB.2023.3283801. Epub 2023 Oct 9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

频繁模式的期望最大化算法，一种针对生物数据集的特定的、局部的、基于模式的双聚类算法。

Expectation Maximization of Frequent Patterns, a Specific, Local, Pattern-Based Biclustering Algorithm for Biological Datasets.

作者信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献