Alavi Majd Hamid, Shahsavari Soodeh, Baghestani Ahmad Reza, Tabatabaei Seyyed Mohammad, Khadem Bashi Naghme, Rezaei Tavirani Mostafa, Hamidpour Mohsen
Biostatistics Department, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran 19716-53313, Iran.
Medical Informatics Department, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran 19716-53313, Iran.
Scientifica (Cairo). 2016;2016:3059767. doi: 10.1155/2016/3059767. Epub 2016 Mar 9.
Background. Biclustering algorithms for the analysis of high-dimensional gene expression data were proposed. Among them, the plaid model is arguably one of the most flexible biclustering models up to now. Objective. The main goal of this study is to provide an evaluation of plaid models. To that end, we will investigate this model on both simulation data and real gene expression datasets. Methods. Two simulated matrices with different degrees of overlap and noise are generated and then the intrinsic structure of these data is compared with biclusters result. Also, we have searched biologically significant discovered biclusters by GO analysis. Results. When there is no noise the algorithm almost discovered all of the biclusters but when there is moderate noise in the dataset, this algorithm cannot perform very well in finding overlapping biclusters and if noise is big, the result of biclustering is not reliable. Conclusion. The plaid model needs to be modified because when there is a moderate or big noise in the data, it cannot find good biclusters. This is a statistical model and is a quite flexible one. In summary, in order to reduce the errors, model can be manipulated and distribution of error can be changed.
背景。针对高维基因表达数据的分析,人们提出了双聚类算法。其中,格子模型可以说是目前最灵活的双聚类模型之一。目的。本研究的主要目标是对格子模型进行评估。为此,我们将在模拟数据和真实基因表达数据集上对该模型进行研究。方法。生成两个具有不同重叠度和噪声的模拟矩阵,然后将这些数据的内在结构与双聚类结果进行比较。此外,我们通过基因本体(GO)分析来搜索具有生物学意义的已发现双聚类。结果。当没有噪声时,该算法几乎能发现所有双聚类,但当数据集中存在中等噪声时,该算法在寻找重叠双聚类方面表现不佳,而如果噪声很大,双聚类的结果就不可靠。结论。格子模型需要改进,因为当数据中存在中等或较大噪声时,它无法找到良好的双聚类。这是一个统计模型,并且是一个相当灵活的模型。总之,为了减少误差,可以对模型进行调整并改变误差分布。