Suppr超能文献

频繁模式的期望最大化算法,一种针对生物数据集的特定的、局部的、基于模式的双聚类算法。

Expectation Maximization of Frequent Patterns, a Specific, Local, Pattern-Based Biclustering Algorithm for Biological Datasets.

作者信息

Moore Erin Jessica, Bourlai Thirmachos

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2016 Sep-Oct;13(5):812-824. doi: 10.1109/TCBB.2015.2510011. Epub 2015 Dec 17.

Abstract

Currently, binary biclustering algorithms are too slow and non-specific to handle biological datasets that have a large number of attributes, which is essential for the computational biology problem of microarray analysis. Specialized computers may be needed to execute an algorithm, and may fail to produce a solution, due to its large resource needs. The biclusters also include too many false positives, the type I error, which hinders biological discovery. We propose an algorithm that can analyze datasets with a large attribute set at different densities, and can operate on a laptop, which makes it accessible to practitioners. EMFP produces biclusters that have a very low Root Mean Squared Error and false positive rate, with very few type II errors. Our binary biclustering algorithm is a hybrid, axis-parallel, pattern-based algorithm that finds multiple, non-overlapping, near-constant, deterministic, binary submatricies, with a variable confidence threshold, and the novel use of local density comparisons versus the standard global threshold. EMFP introduces a new, and intuitive way to calculate internal measures for binary biclustering methods. We also introduce a framework to ease comparison with other algorithms, and compare to both binary and general biclustering algorithms using two real, and 80 synthetic databases.

摘要

目前,二元双聚类算法处理具有大量属性的生物数据集时速度过慢且缺乏特异性,而这些属性对于微阵列分析的计算生物学问题至关重要。执行算法可能需要专门的计算机,并且由于其对资源需求巨大,可能无法得出解决方案。双聚类还包含过多的假阳性(I型错误),这阻碍了生物学发现。我们提出了一种算法,它能够分析具有不同密度的大属性集数据集,并且可以在笔记本电脑上运行,从而便于从业者使用。EMFP生成的双聚类具有非常低的均方根误差和假阳性率,II型错误也很少。我们的二元双聚类算法是一种混合的、轴平行的、基于模式的算法,它能找到多个不重叠、近似恒定、确定性的二元子矩阵,具有可变的置信阈值,并创新性地使用局部密度比较而非标准全局阈值。EMFP引入了一种全新且直观的方法来计算二元双聚类方法的内部度量。我们还引入了一个便于与其他算法进行比较的框架,并使用两个真实和80个合成数据库与二元和通用双聚类算法进行比较。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验