Suppr超能文献

一种基于改进的皮尔逊相关性接近度的层次聚类方法,用于挖掘基因间的生物学关联。

An improved Pearson's correlation proximity-based hierarchical clustering for mining biological association between genes.

作者信息

Booma P M, Prabhakaran S, Dhanalakshmi R

机构信息

Department of Computer and Engineering, KCG College of Technology, KCG Nagar, Rajiv Gandhi Salai, Karapakkam, Chennai, Tamil Nadu 600097, India.

Department of Computer Science and Engineering, SRM University, SRM Nagar, Kattankulathur, Kanchipuram, National Highway 45, Potheri, Tamil Nadu 603203, India.

出版信息

ScientificWorldJournal. 2014;2014:357873. doi: 10.1155/2014/357873. Epub 2014 Jun 16.

Abstract

Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC). Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC) model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality.

摘要

微阵列基因表达数据集已引起分子生物学家、统计学家和计算机科学家的高度关注。从数据集中提取隐藏且常见信息的数据挖掘方法无法识别基因之间最重要的生物学关联。使用启发式方法搜索标准生物学过程指标仅能测量基因表达水平、阈值和响应时间。启发式搜索可识别并挖掘最佳生物学解决方案,但关联过程未得到有效处理。为监测基因之间更高的表达水平率,提出了一种层次聚类模型,其中使用改进的皮尔逊相关性(PCPHC)的接近度度量同时测量基因之间的生物学关联。此外,种子增强算法在行和列上采用平均连锁方法,以便将种子PCPHC模型扩展为最大全局PCPHC(GL-PCPHC)模型,并识别簇之间的关联。此外,GL-PCPHC应用模式增长方法挖掘PCPHC模式。与现有的基因表达分析相比,PCPHC模型具有更好的性能。针对从UCI知识库和GenBank数据库中提取的标准基准基因表达数据集,对GL-PCPHC模型进行了实验评估,评估指标包括执行时间、模式大小、显著性水平、生物学关联效率和模式质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/905c/4083291/2b54a0efec3e/TSWJ2014-357873.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验