Suppr超能文献

BicSPAM:使用序列模式的灵活双聚类

BicSPAM: flexible biclustering using sequential patterns.

作者信息

Henriques Rui, Madeira Sara C

机构信息

Knowledge Discovery and BIOInformatics group (KDBIO), INESC-ID, and Computer Science and Engineering (CSE) Department, Instituto Superior Técnico, Universidade de Lisboa, Av, Rovisco Pais, 1, 1049-001 Lisboa, Portugal.

出版信息

BMC Bioinformatics. 2014 May 6;15:130. doi: 10.1186/1471-2105-15-130.

Abstract

BACKGROUND

Biclustering is a critical task for biomedical applications. Order-preserving biclusters, submatrices where the values of rows induce the same linear ordering across columns, capture local regularities with constant, shifting, scaling and sequential assumptions. Additionally, biclustering approaches relying on pattern mining output deliver exhaustive solutions with an arbitrary number and positioning of biclusters. However, existing order-preserving approaches suffer from robustness, scalability and/or flexibility issues. Additionally, they are not able to discover biclusters with symmetries and parameterizable levels of noise.

RESULTS

We propose new biclustering algorithms to perform flexible, exhaustive and noise-tolerant biclustering based on sequential patterns (BicSPAM). Strategies are proposed to allow for symmetries and to seize efficiency gains from item-indexable properties and/or from partitioning methods with conservative distance guarantees. Results show BicSPAM ability to capture symmetries, handle planted noise, and scale in terms of memory and time. BicSPAM also achieves the best match-scores for the recovery of hidden biclusters in synthetic datasets with varying noise distributions and levels of missing values. Finally, results on gene expression data lead to complete solutions, delivering new biclusters corresponding to putative modules with heightened biological relevance.

CONCLUSIONS

BicSPAM provides an exhaustive way to discover flexible structures of order-preserving biclusters. To the best of our knowledge, BicSPAM is the first attempt to deal with order-preserving biclusters that allow for symmetries and that are robust to varying levels of noise.

摘要

背景

双聚类是生物医学应用中的一项关键任务。保序双聚类是指行值在各列中诱导出相同线性顺序的子矩阵,它在恒定、移位、缩放和顺序假设下捕捉局部规律。此外,依赖模式挖掘输出的双聚类方法能提供包含任意数量和位置的双聚类的详尽解决方案。然而,现有的保序方法存在鲁棒性、可扩展性和/或灵活性问题。此外,它们无法发现具有对称性和可参数化噪声水平的双聚类。

结果

我们提出了基于序列模式的新型双聚类算法(BicSPAM),以执行灵活、详尽且耐噪声的双聚类。我们提出了一些策略,以允许对称性,并从可按项索引的属性和/或具有保守距离保证的分区方法中获取效率提升。结果表明,BicSPAM能够捕捉对称性、处理植入的噪声,并在内存和时间方面实现扩展。在具有不同噪声分布和缺失值水平的合成数据集中,BicSPAM在恢复隐藏双聚类方面也取得了最佳匹配分数。最后,在基因表达数据上的结果产生了完整的解决方案,提供了与具有更高生物学相关性的假定模块相对应的新双聚类。

结论

BicSPAM提供了一种详尽的方法来发现保序双聚类的灵活结构。据我们所知,BicSPAM是首次尝试处理允许对称性且对不同噪声水平具有鲁棒性的保序双聚类。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验