Suppr超能文献

使用基于知识的基因聚类方法计算基因表达数据。

Computing gene expression data with a knowledge-based gene clustering approach.

作者信息

Rosa Bruce A, Oh Sookyung, Montgomery Beronda L, Chen Jin, Qin Wensheng

出版信息

Int J Biochem Mol Biol. 2010;1(1):51-68. Epub 2010 Jun 15.

Abstract

Computational analysis methods for gene expression data gathered in microarray experiments can be used to identify the functions of previously unstudied genes. While obtaining the expression data is not a difficult task, interpreting and extracting the information from the datasets is challenging. In this study, a knowledge-based approach which identifies and saves important functional genes before filtering based on variability and fold change differences was utilized to study light regulation. Two clustering methods were used to cluster the filtered datasets, and clusters containing a key light regulatory gene were located. The common genes to both of these clusters were identified, and the genes in the common cluster were ranked based on their coexpression to the key gene. This process was repeated for 11 key genes in 3 treatment combinations. The initial filtering method reduced the dataset size from 22,814 probes to an average of 1134 genes, and the resulting common cluster lists contained an average of only 14 genes. These common cluster lists scored higher gene enrichment scores than two individual clustering methods. In addition, the filtering method increased the proportion of light responsive genes in the dataset from 1.8% to 15.2%, and the cluster lists increased this proportion to 18.4%. The relatively short length of these common cluster lists compared to gene groups generated through typical clustering methods or coexpression networks narrows the search for novel functional genes while increasing the likelihood that they are biologically relevant.

摘要

微阵列实验中收集的基因表达数据的计算分析方法可用于识别先前未研究基因的功能。虽然获取表达数据并非难事,但从数据集中解释和提取信息却具有挑战性。在本研究中,一种基于知识的方法被用于研究光调节,该方法在基于变异性和倍数变化差异进行过滤之前,先识别并保存重要的功能基因。使用两种聚类方法对过滤后的数据集进行聚类,并定位包含关键光调节基因的簇。确定了这两个簇的共同基因,并根据它们与关键基因的共表达对共同簇中的基因进行排序。对3种处理组合中的11个关键基因重复了这一过程。初始过滤方法将数据集大小从22,814个探针减少到平均1134个基因,最终得到的共同簇列表平均仅包含14个基因。这些共同簇列表的基因富集分数高于两种单独的聚类方法。此外,过滤方法将数据集中光响应基因的比例从1.8%提高到15.2%,而簇列表将这一比例提高到18.4%。与通过典型聚类方法或共表达网络生成的基因组相比,这些共同簇列表的长度相对较短,这在增加新功能基因与生物学相关性可能性的同时,缩小了对新功能基因的搜索范围。

相似文献

1
Computing gene expression data with a knowledge-based gene clustering approach.
Int J Biochem Mol Biol. 2010;1(1):51-68. Epub 2010 Jun 15.
2
A cluster merging method for time series microarray with production values.
Int J Neural Syst. 2014 Sep;24(6):1450018. doi: 10.1142/S012906571450018X. Epub 2014 Jul 24.
3
CLEAN: CLustering Enrichment ANalysis.
BMC Bioinformatics. 2009 Jul 29;10:234. doi: 10.1186/1471-2105-10-234.
4
A graph-based approach to systematically reconstruct human transcriptional regulatory modules.
Bioinformatics. 2007 Jul 1;23(13):i577-86. doi: 10.1093/bioinformatics/btm227.
5
Identification of cancer-associated gene clusters and genes via clustering penalization.
Stat Interface. 2009 Jan 1;2(1):1-11. doi: 10.4310/sii.2009.v2.n1.a1.
6
CLIC: clustering analysis of large microarray datasets with individual dimension-based clustering.
Nucleic Acids Res. 2010 Jul;38(Web Server issue):W246-53. doi: 10.1093/nar/gkq516. Epub 2010 Jun 6.
9
Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes.
BMC Bioinformatics. 2009 Jan 20;10:27. doi: 10.1186/1471-2105-10-27.
10
Fusion of expression values and protein interaction information using multi-objective optimization for improving gene clustering.
Comput Biol Med. 2017 Oct 1;89:31-43. doi: 10.1016/j.compbiomed.2017.07.015. Epub 2017 Aug 1.

本文引用的文献

1
The tissue expression pattern of the AtGRP5 regulatory region is controlled by a combination of positive and negative elements.
Plant Cell Rep. 2010 May;29(5):461-71. doi: 10.1007/s00299-010-0835-7. Epub 2010 Feb 27.
2
Arabidopsis gene co-expression network and its functional modules.
BMC Bioinformatics. 2009 Oct 21;10:346. doi: 10.1186/1471-2105-10-346.
3
Right place, right time: Spatiotemporal light regulation of plant growth and development.
Plant Signal Behav. 2008 Dec;3(12):1053-60. doi: 10.4161/psb.3.12.6857.
4
AmiGO: online access to ontology and annotation data.
Bioinformatics. 2009 Jan 15;25(2):288-9. doi: 10.1093/bioinformatics/btn615. Epub 2008 Nov 25.
5
Detection of spatial-specific phytochrome responses using targeted expression of biliverdin reductase in Arabidopsis.
Plant Physiol. 2009 Jan;149(1):424-33. doi: 10.1104/pp.108.127050. Epub 2008 Oct 29.
6
Calmodulin7 plays an important role as transcriptional regulator in Arabidopsis seedling development.
Plant Cell. 2008 Jul;20(7):1747-59. doi: 10.1105/tpc.107.057612. Epub 2008 Jul 11.
7
Genetic weighted k-means algorithm for clustering large-scale gene expression data.
BMC Bioinformatics. 2008 May 28;9 Suppl 6(Suppl 6):S12. doi: 10.1186/1471-2105-9-S6-S12.
9
The Arabidopsis Information Resource (TAIR): gene structure and function annotation.
Nucleic Acids Res. 2008 Jan;36(Database issue):D1009-14. doi: 10.1093/nar/gkm965. Epub 2007 Nov 5.
10
An Arabidopsis gene network based on the graphical Gaussian model.
Genome Res. 2007 Nov;17(11):1614-25. doi: 10.1101/gr.6911207. Epub 2007 Oct 5.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验