• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用数据挖掘和数据集成识别功能相关基因:乳腺癌案例研究。

Identification of functionally related genes using data mining and data integration: a breast cancer case study.

机构信息

Istituto Tecnologie Biomediche, Consiglio Nazionale Ricerche, Via Fratelli Cervi 93, Segrate (MI), Italy.

出版信息

BMC Bioinformatics. 2009 Oct 15;10 Suppl 12(Suppl 12):S8. doi: 10.1186/1471-2105-10-S12-S8.

DOI:10.1186/1471-2105-10-S12-S8
PMID:19828084
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2762073/
Abstract

BACKGROUND

The identification of the organisation and dynamics of molecular pathways is crucial for the understanding of cell function. In order to reconstruct the molecular pathways in which a gene of interest is involved in regulating a cell, it is important to identify the set of genes to which it interacts with to determine cell function. In this context, the mining and the integration of a large amount of publicly available data, regarding the transcriptome and the proteome states of a cell, are a useful resource to complement biological research.

RESULTS

We describe an approach for the identification of genes that interact with each other to regulate cell function. The strategy relies on the analysis of gene expression profile similarity, considering large datasets of expression data. During the similarity evaluation, the methodology determines the most significant subset of samples in which the evaluated genes are highly correlated. Hence, the strategy enables the exclusion of samples that are not relevant for each gene pair analysed. This feature is important when considering a large set of samples characterised by heterogeneous experimental conditions where different pools of biological processes can be active across the samples. The putative partners of the studied gene are then further characterised, analysing the distribution of the Gene Ontology terms and integrating the protein-protein interaction (PPI) data. The strategy was applied for the analysis of the functional relationships of a gene of known function, Pyruvate Kinase, and for the prediction of functional partners of the human transcription factor TBX3. In both cases the analysis was done on a dataset composed by breast primary tumour expression data derived from the literature. Integration and analysis of PPI data confirmed the prediction of the methodology, since the genes identified to be functionally related were associated to proteins close in the PPI network. Two genes among the predicted putative partners of TBX3 (GLI3 and GATA3) were confirmed by in vivo binding assays (crosslinking immunoprecipitation, X-ChIP) in which the putative DNA enhancer sequence sites of GATA3 and GLI3 were found to be bound by the Tbx3 protein.

CONCLUSION

The presented strategy is demonstrated to be an effective approach to identify genes that establish functional relationships. The methodology identifies and characterises genes with a similar expression profile, through data mining and integrating data from publicly available resources, to contribute to a better understanding of gene regulation and cell function. The prediction of the TBX3 target genes GLI3 and GATA3 was experimentally confirmed.

摘要

背景

鉴定分子途径的组成和动态对于理解细胞功能至关重要。为了重建感兴趣的基因参与调节细胞的分子途径,确定与它相互作用的一组基因以确定细胞功能非常重要。在这种情况下,挖掘和整合大量关于细胞转录组和蛋白质组状态的公开可用数据是补充生物学研究的有用资源。

结果

我们描述了一种识别相互作用以调节细胞功能的基因的方法。该策略依赖于基因表达谱相似性的分析,考虑了大量的表达数据集。在相似性评估过程中,该方法确定了评估基因高度相关的最显著样本子集。因此,该策略能够排除与分析的每个基因对不相关的样本。当考虑一组由具有不同生物过程池在样本中活跃的异质实验条件所表征的大量样本时,此功能很重要。然后,进一步分析研究基因的假定伙伴,分析基因本体论术语的分布并整合蛋白质-蛋白质相互作用 (PPI) 数据。该策略用于分析已知功能基因丙酮酸激酶的功能关系,并预测人类转录因子 TBX3 的功能伙伴。在这两种情况下,分析都是在由文献中获得的乳腺原发性肿瘤表达数据组成的数据集上进行的。PPI 数据的集成和分析证实了该方法的预测,因为被鉴定为功能相关的基因与 PPI 网络中接近的蛋白质相关联。在 TBX3 的预测假定伙伴中,有两个基因(GLI3 和 GATA3)通过体内结合测定(交联免疫沉淀,X-ChIP)得到证实,其中发现 GATA3 和 GLI3 的假定 DNA 增强子序列位点被 Tbx3 蛋白结合。

结论

所提出的策略被证明是一种识别建立功能关系的基因的有效方法。该方法通过数据挖掘和整合来自公开可用资源的数据,识别和表征具有相似表达谱的基因,有助于更好地理解基因调控和细胞功能。TBX3 靶基因 GLI3 和 GATA3 的预测得到了实验证实。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79b7/2762073/70cee73af269/1471-2105-10-S12-S8-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79b7/2762073/cb8c8b96c34a/1471-2105-10-S12-S8-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79b7/2762073/0534fa2c13bd/1471-2105-10-S12-S8-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79b7/2762073/c49ba89a3dde/1471-2105-10-S12-S8-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79b7/2762073/70cee73af269/1471-2105-10-S12-S8-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79b7/2762073/cb8c8b96c34a/1471-2105-10-S12-S8-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79b7/2762073/0534fa2c13bd/1471-2105-10-S12-S8-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79b7/2762073/c49ba89a3dde/1471-2105-10-S12-S8-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79b7/2762073/70cee73af269/1471-2105-10-S12-S8-4.jpg

相似文献

1
Identification of functionally related genes using data mining and data integration: a breast cancer case study.利用数据挖掘和数据集成识别功能相关基因:乳腺癌案例研究。
BMC Bioinformatics. 2009 Oct 15;10 Suppl 12(Suppl 12):S8. doi: 10.1186/1471-2105-10-S12-S8.
2
Mining breast cancer genes with a network based noise-tolerant approach.基于网络的抗噪方法挖掘乳腺癌基因
BMC Syst Biol. 2013 Jun 25;7:49. doi: 10.1186/1752-0509-7-49.
3
Combined Analysis of ChIP Sequencing and Gene Expression Dataset in Breast Cancer.乳腺癌中染色质免疫沉淀测序与基因表达数据集的联合分析
Pathol Oncol Res. 2017 Apr;23(2):361-368. doi: 10.1007/s12253-016-0116-z. Epub 2016 Sep 21.
4
Underlying Genes Involved in Atherosclerotic Macrophages: Insights from Microarray Data Mining.动脉粥样硬化巨噬细胞中涉及的潜在基因:来自基因芯片数据挖掘的见解。
Med Sci Monit. 2019 Dec 25;25:9949-9962. doi: 10.12659/MSM.917068.
5
Text mining-based word representations for biomedical data analysis and protein-protein interaction networks in machine learning tasks.基于文本挖掘的词表示在生物医学数据分析和机器学习任务中的蛋白质-蛋白质相互作用网络。
PLoS One. 2021 Oct 15;16(10):e0258623. doi: 10.1371/journal.pone.0258623. eCollection 2021.
6
BreastMark: an integrated approach to mining publicly available transcriptomic datasets relating to breast cancer outcome.BreastMark:一种挖掘与乳腺癌预后相关的公开转录组数据集的综合方法。
Breast Cancer Res. 2013;15(4):R52. doi: 10.1186/bcr3444.
7
Comprehensive tissue-specific gene set enrichment analysis and transcription factor analysis of breast cancer by integrating 14 gene expression datasets.通过整合14个基因表达数据集对乳腺癌进行全面的组织特异性基因集富集分析和转录因子分析。
Oncotarget. 2017 Jan 24;8(4):6775-6786. doi: 10.18632/oncotarget.14286.
8
Mining patterns in disease classification forests.疾病分类森林中的模式挖掘。
J Biomed Inform. 2010 Oct;43(5):820-7. doi: 10.1016/j.jbi.2010.06.004. Epub 2010 Jun 23.
9
Latent Semantic Indexing of PubMed abstracts for identification of transcription factor candidates from microarray derived gene sets.基于PubMed 摘要的潜在语义索引从微阵列基因集中识别转录因子候选物。
BMC Bioinformatics. 2011 Oct 18;12 Suppl 10(Suppl 10):S19. doi: 10.1186/1471-2105-12-S10-S19.
10
Integration of RNA-Seq data with heterogeneous microarray data for breast cancer profiling.整合RNA测序数据与异质性微阵列数据用于乳腺癌分析。
BMC Bioinformatics. 2017 Nov 21;18(1):506. doi: 10.1186/s12859-017-1925-0.

引用本文的文献

1
Fetal Mammary Gland Development and Offspring's Breast Cancer Risk in Adulthood.胎儿乳腺发育与成年后代患乳腺癌的风险
Biology (Basel). 2025 Jan 21;14(2):106. doi: 10.3390/biology14020106.
2
AAAKB: A manually curated database for tracking and predicting genes of Abdominal aortic aneurysm (AAA).AAAKB:一个用于跟踪和预测腹主动脉瘤(AAA)基因的人工 curated 数据库。
PLoS One. 2023 Dec 15;18(12):e0289966. doi: 10.1371/journal.pone.0289966. eCollection 2023.
3
Embryonic Programs in Cancer and Metastasis-Insights From the Mammary Gland.

本文引用的文献

1
A rat mammary gland cancer cell with stem cell properties of self-renewal and multi-lineage differentiation.一株具有自我更新和多向分化特性的鼠乳腺癌细胞。
Cytotechnology. 2008 Sep;58(1):25-32. doi: 10.1007/s10616-008-9173-9. Epub 2008 Nov 25.
2
Human Protein Reference Database--2009 update.人类蛋白质参考数据库——2009年更新版
Nucleic Acids Res. 2009 Jan;37(Database issue):D767-72. doi: 10.1093/nar/gkn892. Epub 2008 Nov 6.
3
Distinct populations of tumor-initiating cells derived from a tumor generated by rat mammary cancer stem cells.
癌症与转移中的胚胎程序——来自乳腺的见解
Front Cell Dev Biol. 2022 Jun 29;10:938625. doi: 10.3389/fcell.2022.938625. eCollection 2022.
4
Identification of Key Genes and Pathways in Persistent Hyperplastic Primary Vitreous of the Eye Using Bioinformatic Analysis.利用生物信息学分析鉴定眼部永存性原发性玻璃体增生症中的关键基因和通路
Front Med (Lausanne). 2021 Aug 13;8:690594. doi: 10.3389/fmed.2021.690594. eCollection 2021.
5
Text Mining-Based Drug Discovery in Osteoarthritis.基于文本挖掘的骨关节炎药物发现。
J Healthc Eng. 2021 Apr 14;2021:6674744. doi: 10.1155/2021/6674744. eCollection 2021.
6
Identification of key genes and pathways in scleral extracellular matrix remodeling in glaucoma: Potential therapeutic agents discovered using bioinformatics analysis.利用生物信息学分析鉴定青光眼巩膜细胞外基质重塑中的关键基因和通路:潜在的治疗药物。
Int J Med Sci. 2021 Feb 4;18(7):1554-1565. doi: 10.7150/ijms.52846. eCollection 2021.
7
Computational screening of potential glioma-related genes and drugs based on analysis of GEO dataset and text mining.基于 GEO 数据集和文本挖掘的潜在脑胶质瘤相关基因和药物的计算筛选。
PLoS One. 2021 Feb 26;16(2):e0247612. doi: 10.1371/journal.pone.0247612. eCollection 2021.
8
An integrative methodology based on protein-protein interaction networks for identification and functional annotation of disease-relevant genes applied to channelopathies.基于蛋白质-蛋白质相互作用网络的综合方法用于鉴定和功能注释与通道病相关的基因。
BMC Bioinformatics. 2019 Nov 12;20(1):565. doi: 10.1186/s12859-019-3162-1.
9
Potential Therapeutic Drugs for Parkinson's Disease Based on Data Mining and Bioinformatics Analysis.基于数据挖掘和生物信息学分析的帕金森病潜在治疗药物
Parkinsons Dis. 2018 Oct 2;2018:3464578. doi: 10.1155/2018/3464578. eCollection 2018.
10
Text mining-based in silico drug discovery in oral mucositis caused by high-dose cancer therapy.基于文本挖掘的高剂量癌症治疗引起的口腔黏膜炎的计算机药物发现。
Support Care Cancer. 2018 Aug;26(8):2695-2705. doi: 10.1007/s00520-018-4096-2. Epub 2018 Feb 23.
源自大鼠乳腺癌干细胞产生的肿瘤的不同肿瘤起始细胞群体。
Proc Natl Acad Sci U S A. 2008 Nov 4;105(44):16940-5. doi: 10.1073/pnas.0808978105. Epub 2008 Oct 28.
4
NCBI GEO: archive for high-throughput functional genomic data.NCBI基因表达综合数据库:高通量功能基因组数据存档库。
Nucleic Acids Res. 2009 Jan;37(Database issue):D885-90. doi: 10.1093/nar/gkn764. Epub 2008 Oct 21.
5
Tumor dormancy and immunoescape.肿瘤休眠与免疫逃逸。
APMIS. 2008 Jul-Aug;116(7-8):685-94. doi: 10.1111/j.1600-0463.2008.01163.x.
6
The Gene Ontology (GO) project: structured vocabularies for molecular biology and their application to genome and expression analysis.基因本体论(GO)项目:分子生物学的结构化词汇及其在基因组和表达分析中的应用。
Curr Protoc Bioinformatics. 2008 Sep;Chapter 7:7.2.1-7.2.9. doi: 10.1002/0471250953.bi0702s23.
7
The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation.以TRANSFAC项目为例,它是一种支持基因组调控分析的框架技术。
Brief Bioinform. 2008 Jul;9(4):326-32. doi: 10.1093/bib/bbn016. Epub 2008 Apr 24.
8
ReXSpecies--a tool for the analysis of the evolution of gene regulation across species.ReXSpecies——一种用于分析跨物种基因调控进化的工具。
BMC Evol Biol. 2008 Apr 14;8:111. doi: 10.1186/1471-2148-8-111.
9
GATA-3 and the regulation of the mammary luminal cell fate.GATA-3与乳腺管腔细胞命运的调控
Curr Opin Cell Biol. 2008 Apr;20(2):164-70. doi: 10.1016/j.ceb.2008.02.003. Epub 2008 Mar 21.
10
The M2 splice isoform of pyruvate kinase is important for cancer metabolism and tumour growth.丙酮酸激酶的M2剪接异构体对癌症代谢和肿瘤生长很重要。
Nature. 2008 Mar 13;452(7184):230-3. doi: 10.1038/nature06734.