• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于基因表达数据分析的动态关联规则

Dynamic association rules for gene expression data analysis.

作者信息

Chen Shu-Chuan, Tsai Tsung-Hsien, Chung Cheng-Han, Li Wen-Hsiung

机构信息

Department of Mathematics and Statistics, Idaho State University, Pocatello, ID, 83209, USA.

Department of Statistics, National Cheng-Kung University, Tainan, 701, Taiwan.

出版信息

BMC Genomics. 2015 Oct 14;16:786. doi: 10.1186/s12864-015-1970-x.

DOI:10.1186/s12864-015-1970-x
PMID:26467206
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4606551/
Abstract

BACKGROUND

The purpose of gene expression analysis is to look for the association between regulation of gene expression levels and phenotypic variations. This association based on gene expression profile has been used to determine whether the induction/repression of genes correspond to phenotypic variations including cell regulations, clinical diagnoses and drug development. Statistical analyses on microarray data have been developed to resolve gene selection issue. However, these methods do not inform us of causality between genes and phenotypes. In this paper, we propose the dynamic association rule algorithm (DAR algorithm) which helps ones to efficiently select a subset of significant genes for subsequent analysis. The DAR algorithm is based on association rules from market basket analysis in marketing. We first propose a statistical way, based on constructing a one-sided confidence interval and hypothesis testing, to determine if an association rule is meaningful. Based on the proposed statistical method, we then developed the DAR algorithm for gene expression data analysis. The method was applied to analyze four microarray datasets and one Next Generation Sequencing (NGS) dataset: the Mice Apo A1 dataset, the whole genome expression dataset of mouse embryonic stem cells, expression profiling of the bone marrow of Leukemia patients, Microarray Quality Control (MAQC) data set and the RNA-seq dataset of a mouse genomic imprinting study. A comparison of the proposed method with the t-test on the expression profiling of the bone marrow of Leukemia patients was conducted.

RESULTS

We developed a statistical way, based on the concept of confidence interval, to determine the minimum support and minimum confidence for mining association relationships among items. With the minimum support and minimum confidence, one can find significant rules in one single step. The DAR algorithm was then developed for gene expression data analysis. Four gene expression datasets showed that the proposed DAR algorithm not only was able to identify a set of differentially expressed genes that largely agreed with that of other methods, but also provided an efficient and accurate way to find influential genes of a disease.

CONCLUSIONS

In the paper, the well-established association rule mining technique from marketing has been successfully modified to determine the minimum support and minimum confidence based on the concept of confidence interval and hypothesis testing. It can be applied to gene expression data to mine significant association rules between gene regulation and phenotype. The proposed DAR algorithm provides an efficient way to find influential genes that underlie the phenotypic variance.

摘要

背景

基因表达分析的目的是寻找基因表达水平调控与表型变异之间的关联。基于基因表达谱的这种关联已被用于确定基因的诱导/抑制是否与包括细胞调控、临床诊断和药物开发在内的表型变异相对应。已开发出对微阵列数据的统计分析方法来解决基因选择问题。然而,这些方法并未告知我们基因与表型之间的因果关系。在本文中,我们提出了动态关联规则算法(DAR算法),该算法有助于人们有效地选择一组重要基因用于后续分析。DAR算法基于市场营销中购物篮分析的关联规则。我们首先提出一种基于构建单侧置信区间和假设检验的统计方法,以确定一个关联规则是否有意义。基于所提出的统计方法,我们随后开发了用于基因表达数据分析的DAR算法。该方法被应用于分析四个微阵列数据集和一个下一代测序(NGS)数据集:小鼠载脂蛋白A1数据集、小鼠胚胎干细胞全基因组表达数据集、白血病患者骨髓表达谱、微阵列质量控制(MAQC)数据集以及小鼠基因组印记研究的RNA测序数据集。对白血病患者骨髓表达谱进行了所提出的方法与t检验的比较。

结果

我们基于置信区间的概念开发了一种统计方法,以确定挖掘项目间关联关系的最小支持度和最小置信度。有了最小支持度和最小置信度,人们可以一步找到显著规则。然后开发了用于基因表达数据分析的DAR算法。四个基因表达数据集表明,所提出的DAR算法不仅能够识别出一组与其他方法基本一致的差异表达基因,还提供了一种高效且准确的方法来找到疾病的影响基因。

结论

在本文中,来自市场营销的成熟关联规则挖掘技术已成功修改,基于置信区间和假设检验的概念确定了最小支持度和最小置信度。它可应用于基因表达数据,以挖掘基因调控与表型之间的显著关联规则。所提出的DAR算法提供了一种有效的方法来找到构成表型变异基础的影响基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40af/4606551/901189155b3f/12864_2015_1970_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40af/4606551/b7f06d06268e/12864_2015_1970_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40af/4606551/30e7886ab3bc/12864_2015_1970_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40af/4606551/d7540e8d56cb/12864_2015_1970_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40af/4606551/1eb541e6aca5/12864_2015_1970_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40af/4606551/901189155b3f/12864_2015_1970_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40af/4606551/b7f06d06268e/12864_2015_1970_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40af/4606551/30e7886ab3bc/12864_2015_1970_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40af/4606551/d7540e8d56cb/12864_2015_1970_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40af/4606551/1eb541e6aca5/12864_2015_1970_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40af/4606551/901189155b3f/12864_2015_1970_Fig5_HTML.jpg

相似文献

1
Dynamic association rules for gene expression data analysis.用于基因表达数据分析的动态关联规则
BMC Genomics. 2015 Oct 14;16:786. doi: 10.1186/s12864-015-1970-x.
2
Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data.用于大规模基因表达数据分析的强关联规则挖掘:以人类SAGE数据为例的研究
Genome Biol. 2002;3(12):RESEARCH0067. doi: 10.1186/gb-2002-3-12-research0067. Epub 2002 Nov 21.
3
Mining gene expression databases for association rules.挖掘基因表达数据库中的关联规则。
Bioinformatics. 2003 Jan;19(1):79-86. doi: 10.1093/bioinformatics/19.1.79.
4
High confidence rule mining for microarray analysis.用于微阵列分析的高置信度规则挖掘
IEEE/ACM Trans Comput Biol Bioinform. 2007 Oct-Dec;4(4):611-623. doi: 10.1109/tcbb.2007.1050.
5
DTFP-Growth: Dynamic Threshold-Based FP-Growth Rule Mining Algorithm Through Integrating Gene Expression, Methylation, and Protein-Protein Interaction Profiles.DTFP-Growth:通过整合基因表达、甲基化和蛋白质-蛋白质相互作用谱的基于动态阈值的 FP 增长规则挖掘算法。
IEEE Trans Nanobioscience. 2018 Apr;17(2):117-125. doi: 10.1109/TNB.2018.2803021.
6
Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.使用StatBicRM分析大型基因表达和甲基化数据概况:基于统计双聚类的规则挖掘
PLoS One. 2015 Apr 1;10(4):e0119448. doi: 10.1371/journal.pone.0119448. eCollection 2015.
7
Mining gene expression data for positive and negative co-regulated gene clusters.挖掘基因表达数据以寻找正负共调控基因簇。
Bioinformatics. 2004 Nov 1;20(16):2711-8. doi: 10.1093/bioinformatics/bth312. Epub 2004 May 14.
8
A multi-Poisson dynamic mixture model to cluster developmental patterns of gene expression by RNA-seq.一种用于通过RNA测序对基因表达发育模式进行聚类的多泊松动态混合模型。
Brief Bioinform. 2015 Mar;16(2):205-15. doi: 10.1093/bib/bbu013. Epub 2014 May 10.
9
Cross-platform comparison and visualisation of gene expression data using co-inertia analysis.使用共惯性分析对基因表达数据进行跨平台比较和可视化
BMC Bioinformatics. 2003 Nov 21;4:59. doi: 10.1186/1471-2105-4-59.
10
Discovering relational-based association rules with multiple minimum supports on microarray datasets.基于微阵列数据集的多个最小支持的关系型关联规则发现。
Bioinformatics. 2011 Nov 15;27(22):3142-8. doi: 10.1093/bioinformatics/btr526. Epub 2011 Sep 16.

引用本文的文献

1
Relation Extraction of Protein Complexes from Dynamic Protein-Protein Interaction Network.从动态蛋白质-蛋白质相互作用网络中提取蛋白质复合物的关系
J Biomed Phys Eng. 2021 Dec 1;11(6):675-684. doi: 10.31661/jbpe.v0i0.1119. eCollection 2021 Dec.
2
Occurrence prediction of pests and diseases in cotton on the basis of weather factors by long short term memory network.基于长短期记忆网络的气象因子棉花病虫害发生预测。
BMC Bioinformatics. 2019 Dec 24;20(Suppl 25):688. doi: 10.1186/s12859-019-3262-y.
3
Cross-Disease Innate Gene Signature: Emerging Diversity and Abundance in RA Comparing to SLE and SSc.

本文引用的文献

1
Role of Tet1 in erasure of genomic imprinting.Tet1 在基因组印记消除中的作用。
Nature. 2013 Dec 19;504(7480):460-4. doi: 10.1038/nature12805. Epub 2013 Dec 1.
2
Tet1 controls meiosis by regulating meiotic gene expression.Tet1 通过调节减数分裂基因表达来控制减数分裂。
Nature. 2012 Dec 20;492(7429):443-7. doi: 10.1038/nature11709. Epub 2012 Nov 14.
3
Evaluating methods for ranking differentially expressed genes applied to microArray quality control data.评估应用于微阵列质量控制数据的差异表达基因排序方法。
跨疾病固有基因特征:与 SLE 和 SSc 相比,RA 中的新兴多样性和丰度。
J Immunol Res. 2019 Jul 16;2019:3575803. doi: 10.1155/2019/3575803. eCollection 2019.
4
Using Machine Learning to Measure Relatedness Between Genes: A Multi-Features Model.使用机器学习测量基因间的相关性:一种多特征模型。
Sci Rep. 2019 Mar 12;9(1):4192. doi: 10.1038/s41598-019-40780-7.
5
Systematic exploration of cell morphological phenotypes associated with a transcriptomic query.系统探索与转录组查询相关的细胞形态表型。
Nucleic Acids Res. 2018 Nov 2;46(19):e116. doi: 10.1093/nar/gky626.
6
Systemic Homeostasis in Metabolome, Ionome, and Microbiome of Wild Yellowfin Goby in Estuarine Ecosystem.野生黄鳍鲷在河口生态系统中代谢组、离子组和微生物组的系统内稳态。
Sci Rep. 2018 Feb 22;8(1):3478. doi: 10.1038/s41598-018-20120-x.
7
ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis.ConGEMs:通过基于规则的聚类发现浓缩基因共表达模块及其在致癌作用中的应用
Genes (Basel). 2017 Dec 28;9(1):7. doi: 10.3390/genes9010007.
BMC Bioinformatics. 2011 Jun 6;12:227. doi: 10.1186/1471-2105-12-227.
4
Control of the embryonic stem cell state.胚胎干细胞状态的控制。
Cell. 2011 Mar 18;144(6):940-54. doi: 10.1016/j.cell.2011.01.032.
5
HDLs inhibit endoplasmic reticulum stress and autophagic response induced by oxidized LDLs.高密度脂蛋白(HDLs)可抑制氧化型低密度脂蛋白(ox-LDLs)诱导的内质网应激和自噬反应。
Cell Death Differ. 2011 May;18(5):817-28. doi: 10.1038/cdd.2010.149. Epub 2010 Nov 26.
6
Genome-wide reprogramming in the mouse germ line entails the base excision repair pathway.在小鼠生殖系中进行全基因组重编程需要碱基切除修复途径。
Science. 2010 Jul 2;329(5987):78-82. doi: 10.1126/science.1187945.
7
Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.通过 RNA-Seq 进行转录本组装和定量分析揭示了细胞分化过程中未注释的转录本和异构体转换。
Nat Biotechnol. 2010 May;28(5):511-5. doi: 10.1038/nbt.1621. Epub 2010 May 2.
8
Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments.mRNA-Seq 实验中标准化和差异表达的统计方法评估。
BMC Bioinformatics. 2010 Feb 18;11:94. doi: 10.1186/1471-2105-11-94.
9
JARID2 regulates binding of the Polycomb repressive complex 2 to target genes in ES cells.JARID2 调节多梳抑制复合物 2 与 ES 细胞中靶基因的结合。
Nature. 2010 Mar 11;464(7286):306-10. doi: 10.1038/nature08788.
10
A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome.通过对人类转录组进行深度测序实现对基因活性和可变剪接的全局观察。
Science. 2008 Aug 15;321(5891):956-60. doi: 10.1126/science.1160342. Epub 2008 Jul 3.