Suppr超能文献

用于基因表达数据分析的动态关联规则

Dynamic association rules for gene expression data analysis.

作者信息

Chen Shu-Chuan, Tsai Tsung-Hsien, Chung Cheng-Han, Li Wen-Hsiung

机构信息

Department of Mathematics and Statistics, Idaho State University, Pocatello, ID, 83209, USA.

Department of Statistics, National Cheng-Kung University, Tainan, 701, Taiwan.

出版信息

BMC Genomics. 2015 Oct 14;16:786. doi: 10.1186/s12864-015-1970-x.

Abstract

BACKGROUND

The purpose of gene expression analysis is to look for the association between regulation of gene expression levels and phenotypic variations. This association based on gene expression profile has been used to determine whether the induction/repression of genes correspond to phenotypic variations including cell regulations, clinical diagnoses and drug development. Statistical analyses on microarray data have been developed to resolve gene selection issue. However, these methods do not inform us of causality between genes and phenotypes. In this paper, we propose the dynamic association rule algorithm (DAR algorithm) which helps ones to efficiently select a subset of significant genes for subsequent analysis. The DAR algorithm is based on association rules from market basket analysis in marketing. We first propose a statistical way, based on constructing a one-sided confidence interval and hypothesis testing, to determine if an association rule is meaningful. Based on the proposed statistical method, we then developed the DAR algorithm for gene expression data analysis. The method was applied to analyze four microarray datasets and one Next Generation Sequencing (NGS) dataset: the Mice Apo A1 dataset, the whole genome expression dataset of mouse embryonic stem cells, expression profiling of the bone marrow of Leukemia patients, Microarray Quality Control (MAQC) data set and the RNA-seq dataset of a mouse genomic imprinting study. A comparison of the proposed method with the t-test on the expression profiling of the bone marrow of Leukemia patients was conducted.

RESULTS

We developed a statistical way, based on the concept of confidence interval, to determine the minimum support and minimum confidence for mining association relationships among items. With the minimum support and minimum confidence, one can find significant rules in one single step. The DAR algorithm was then developed for gene expression data analysis. Four gene expression datasets showed that the proposed DAR algorithm not only was able to identify a set of differentially expressed genes that largely agreed with that of other methods, but also provided an efficient and accurate way to find influential genes of a disease.

CONCLUSIONS

In the paper, the well-established association rule mining technique from marketing has been successfully modified to determine the minimum support and minimum confidence based on the concept of confidence interval and hypothesis testing. It can be applied to gene expression data to mine significant association rules between gene regulation and phenotype. The proposed DAR algorithm provides an efficient way to find influential genes that underlie the phenotypic variance.

摘要

背景

基因表达分析的目的是寻找基因表达水平调控与表型变异之间的关联。基于基因表达谱的这种关联已被用于确定基因的诱导/抑制是否与包括细胞调控、临床诊断和药物开发在内的表型变异相对应。已开发出对微阵列数据的统计分析方法来解决基因选择问题。然而,这些方法并未告知我们基因与表型之间的因果关系。在本文中,我们提出了动态关联规则算法(DAR算法),该算法有助于人们有效地选择一组重要基因用于后续分析。DAR算法基于市场营销中购物篮分析的关联规则。我们首先提出一种基于构建单侧置信区间和假设检验的统计方法,以确定一个关联规则是否有意义。基于所提出的统计方法,我们随后开发了用于基因表达数据分析的DAR算法。该方法被应用于分析四个微阵列数据集和一个下一代测序(NGS)数据集:小鼠载脂蛋白A1数据集、小鼠胚胎干细胞全基因组表达数据集、白血病患者骨髓表达谱、微阵列质量控制(MAQC)数据集以及小鼠基因组印记研究的RNA测序数据集。对白血病患者骨髓表达谱进行了所提出的方法与t检验的比较。

结果

我们基于置信区间的概念开发了一种统计方法,以确定挖掘项目间关联关系的最小支持度和最小置信度。有了最小支持度和最小置信度,人们可以一步找到显著规则。然后开发了用于基因表达数据分析的DAR算法。四个基因表达数据集表明,所提出的DAR算法不仅能够识别出一组与其他方法基本一致的差异表达基因,还提供了一种高效且准确的方法来找到疾病的影响基因。

结论

在本文中,来自市场营销的成熟关联规则挖掘技术已成功修改,基于置信区间和假设检验的概念确定了最小支持度和最小置信度。它可应用于基因表达数据,以挖掘基因调控与表型之间的显著关联规则。所提出的DAR算法提供了一种有效的方法来找到构成表型变异基础的影响基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40af/4606551/b7f06d06268e/12864_2015_1970_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验