• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

挖掘基因表达数据库中的关联规则。

Mining gene expression databases for association rules.

作者信息

Creighton Chad, Hanash Samir

机构信息

Bioinformatics Program Pediatrics and Communicable Diseases, University of Michigan, Ann Arbor 48109, USA.

出版信息

Bioinformatics. 2003 Jan;19(1):79-86. doi: 10.1093/bioinformatics/19.1.79.

DOI:10.1093/bioinformatics/19.1.79
PMID:12499296
Abstract

MOTIVATION

Global gene expression profiling, both at the transcript level and at the protein level, can be a valuable tool in the understanding of genes, biological networks, and cellular states. As larger and larger gene expression data sets become available, data mining techniques can be applied to identify patterns of interest in the data. Association rules, used widely in the area of market basket analysis, can be applied to the analysis of expression data as well. Association rules can reveal biologically relevant associations between different genes or between environmental effects and gene expression. An association rule has the form LHS --> RHS, where LHS and RHS are disjoint sets of items, the RHS set being likely to occur whenever the LHS set occurs. Items in gene expression data can include genes that are highly expressed or repressed, as well as relevant facts describing the cellular environment of the genes (e.g. the diagnosis of a tumor sample from which a profile was obtained).

RESULTS

We demonstrate an algorithm for efficiently mining association rules from gene expression data, using the data set from Hughes et al. (2000, Cell, 102, 109-126) of 300 expression profiles for yeast. Using the algorithm, we find numerous rules in the data. A cursory analysis of some of these rules reveals numerous associations between certain genes, many of which make sense biologically, others suggesting new hypotheses that may warrant further investigation. In a data set derived from the yeast data set, but with the expression values for each transcript randomly shifted with respect to the experiments, no rules were found, indicating that most all of the rules mined from the actual data set are not likely to have occurred by chance.

AVAILABILITY

An implementation of the algorithm using Microsoft SQL Server with Access 2000 is available at http://dot.ped.med.umich.edu:2000/pub/assoc_rules/assoc_rules.zip. Our results from mining the yeast data set are available at http://dot.ped.med.umich.edu:2000/pub/assoc_rules/yeast_results.zip.

摘要

动机

无论是在转录水平还是蛋白质水平上的全球基因表达谱分析,都可能是理解基因、生物网络和细胞状态的一种有价值的工具。随着越来越大的基因表达数据集变得可用,数据挖掘技术可用于识别数据中感兴趣的模式。在购物篮分析领域广泛使用的关联规则,也可应用于表达数据分析。关联规则能够揭示不同基因之间或环境效应与基因表达之间的生物学相关关联。一条关联规则具有LHS --> RHS的形式,其中LHS和RHS是不相交的项目集,每当LHS集出现时,RHS集就有可能出现。基因表达数据中的项目可以包括高表达或受抑制基因,以及描述基因细胞环境的相关事实(例如从中获得谱图的肿瘤样本的诊断)。

结果

我们展示了一种从基因表达数据中高效挖掘关联规则的算法,使用了休斯等人(2000年,《细胞》,102卷,109 - 126页)提供的酵母300个表达谱的数据集。使用该算法,我们在数据中发现了大量规则。对其中一些规则的粗略分析揭示了某些基因之间的众多关联,其中许多在生物学上是有意义的,其他的则提出了可能值得进一步研究的新假设。在一个源自酵母数据集但每个转录本的表达值相对于实验随机偏移的数据集中,未发现任何规则,这表明从实际数据集中挖掘出的大多数规则不太可能是偶然出现的。

可用性

使用带有Access 2000的Microsoft SQL Server实现的该算法可在http://dot.ped.med.umich.edu:2000/pub/assoc_rules/assoc_rules.zip获取。我们挖掘酵母数据集的结果可在http://dot.ped.med.umich.edu:2000/pub/assoc_rules/yeast_results.zip获取。

相似文献

1
Mining gene expression databases for association rules.挖掘基因表达数据库中的关联规则。
Bioinformatics. 2003 Jan;19(1):79-86. doi: 10.1093/bioinformatics/19.1.79.
2
Mining gene expression data for positive and negative co-regulated gene clusters.挖掘基因表达数据以寻找正负共调控基因簇。
Bioinformatics. 2004 Nov 1;20(16):2711-8. doi: 10.1093/bioinformatics/bth312. Epub 2004 May 14.
3
Database of repetitive elements in complete genomes and data mining using transcription factor binding sites.完整基因组中的重复元件数据库以及利用转录因子结合位点进行数据挖掘
IEEE Trans Inf Technol Biomed. 2003 Jun;7(2):93-100. doi: 10.1109/titb.2003.811878.
4
Dynamic association rules for gene expression data analysis.用于基因表达数据分析的动态关联规则
BMC Genomics. 2015 Oct 14;16:786. doi: 10.1186/s12864-015-1970-x.
5
Comparing expression profiles of genes with similar promoter regions.比较具有相似启动子区域的基因的表达谱。
Bioinformatics. 2002 Dec;18(12):1576-84. doi: 10.1093/bioinformatics/18.12.1576.
6
Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions.超越共表达关系:时移和反向基因表达谱的局部聚类可识别新的生物学相关相互作用。
J Mol Biol. 2001 Dec 14;314(5):1053-66. doi: 10.1006/jmbi.2000.5219.
7
Mining co-regulated gene profiles for the detection of functional associations in gene expression data.挖掘共调控基因谱以检测基因表达数据中的功能关联。
Bioinformatics. 2007 Aug 1;23(15):1927-35. doi: 10.1093/bioinformatics/btm276. Epub 2007 May 30.
8
Text mining of DNA sequence homology searches.DNA序列同源性搜索的文本挖掘
Appl Bioinformatics. 2003;2(3 Suppl):S59-63.
9
TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets.TIGR基因索引聚类工具(TGICL):一种用于快速聚类大型EST数据集的软件系统。
Bioinformatics. 2003 Mar 22;19(5):651-2. doi: 10.1093/bioinformatics/btg034.
10
CARIBIAM: constrained Association Rules using Interactive Biological IncrementAl Mining.加勒比:使用交互式生物增量挖掘的约束关联规则
Int J Bioinform Res Appl. 2008;4(1):28-48. doi: 10.1504/IJBRA.2008.017162.

引用本文的文献

1
Multi-level association rule mining and network pharmacology to identify the polypharmacological effects of herbal materials and compounds in traditional medicine.多层次关联规则挖掘与网络药理学用于识别传统医学中草药和化合物的多药效应。
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf328.
2
Synergistic horizontal transfer of antibiotic resistance genes and transposons in the infant gut microbial genome.婴儿肠道微生物基因组中抗生素耐药基因和转座子的协同水平转移。
mSphere. 2024 Jan 30;9(1):e0060823. doi: 10.1128/msphere.00608-23. Epub 2023 Dec 19.
3
Optimal ranking and directional signature classification using the integral strategy of multi-objective optimization-based association rule mining of multi-omics data.
使用基于多组学数据的多目标优化关联规则挖掘的积分策略进行最优排序和方向特征分类。
Front Bioinform. 2023 Jul 27;3:1182176. doi: 10.3389/fbinf.2023.1182176. eCollection 2023.
4
An Analysis of the Clinical Medication Rules of Traditional Chinese Medicine for Polycystic Ovary Syndrome Based on Data Mining.基于数据挖掘的多囊卵巢综合征中医临床用药规律分析
Evid Based Complement Alternat Med. 2023 Feb 21;2023:6198001. doi: 10.1155/2023/6198001. eCollection 2023.
5
Modular characteristics and the mechanism of Chinese medicine's treatment of gastric cancer: a data mining and pharmacology-based identification.中医药治疗胃癌的模块化特征及机制:基于数据挖掘和药理学的鉴定
Ann Transl Med. 2021 Dec;9(24):1777. doi: 10.21037/atm-21-6301.
6
Another Look at Obesity Paradox in Acute Ischemic Stroke: Association Rule Mining.急性缺血性卒中肥胖悖论的再审视:关联规则挖掘
J Pers Med. 2021 Dec 29;12(1):16. doi: 10.3390/jpm12010016.
7
A method to analyze time expression profiles demonstrated in a database of chili pepper fruit development.一种分析数据库中辣椒果实发育中时间表达谱的方法。
Sci Rep. 2021 Jun 23;11(1):13181. doi: 10.1038/s41598-021-92672-4.
8
Data Mining Approaches for Assessing Chemical Coexposures Using Consumer Product Purchase Data.利用消费者产品购买数据评估化学共暴露的数据分析方法。
Risk Anal. 2021 Sep;41(9):1716-1735. doi: 10.1111/risa.13650. Epub 2020 Dec 16.
9
Title Assessing Potentially Inappropriate Medications in Seniors: Differences between American Geriatrics Society and STOPP Criteria, and Preventing Adverse Drug Reactions.标题:评估老年人潜在不适当用药情况:美国老年医学会标准与STOPP标准的差异以及预防药物不良反应
Geriatrics (Basel). 2020 Sep 30;5(4):68. doi: 10.3390/geriatrics5040068.
10
Potentially Inappropriate Prescribing and Potential Prescribing Omissions in 82,935 Older Hospitalised Adults: Association with Hospital Readmission and Mortality Within Six Months.82935名老年住院患者中潜在不适当处方及潜在处方遗漏情况:与6个月内再次入院及死亡的关联
Geriatrics (Basel). 2020 Jun 12;5(2):37. doi: 10.3390/geriatrics5020037.