• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于成对基因GO的高维表达数据双聚类方法

Pairwise gene GO-based measures for biclustering of high-dimensional expression data.

作者信息

Nepomuceno Juan A, Troncoso Alicia, Nepomuceno-Chamorro Isabel A, Aguilar-Ruiz Jesús S

机构信息

1Departamento de Lenguajes y Sistemas Informáticos, Universidad de Sevilla, Avd. Reina Mercedes s/n, Seville, 41012 Spain.

2Área de Informática, Universidad Pablo de Olavide, Ctra. Utrera km. 1, Seville, 41013 Spain.

出版信息

BioData Min. 2018 Mar 27;11:4. doi: 10.1186/s13040-018-0165-9. eCollection 2018.

DOI:10.1186/s13040-018-0165-9
PMID:29610579
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5872503/
Abstract

BACKGROUND

Biclustering algorithms search for groups of genes that share the same behavior under a subset of samples in gene expression data. Nowadays, the biological knowledge available in public repositories can be used to drive these algorithms to find biclusters composed of groups of genes functionally coherent. On the other hand, a distance among genes can be defined according to their information stored in Gene Ontology (GO). Gene pairwise GO semantic similarity measures report a value for each pair of genes which establishes their functional similarity. A scatter search-based algorithm that optimizes a merit function that integrates GO information is studied in this paper. This merit function uses a term that addresses the information through a GO measure.

RESULTS

The effect of two possible different gene pairwise GO measures on the performance of the algorithm is analyzed. Firstly, three well known yeast datasets with approximately one thousand of genes are studied. Secondly, a group of human datasets related to clinical data of cancer is also explored by the algorithm. Most of these data are high-dimensional datasets composed of a huge number of genes. The resultant biclusters reveal groups of genes linked by a same functionality when the search procedure is driven by one of the proposed GO measures. Furthermore, a qualitative biological study of a group of biclusters show their relevance from a cancer disease perspective.

CONCLUSIONS

It can be concluded that the integration of biological information improves the performance of the biclustering process. The two different GO measures studied show an improvement in the results obtained for the yeast dataset. However, if datasets are composed of a huge number of genes, only one of them really improves the algorithm performance. This second case constitutes a clear option to explore interesting datasets from a clinical point of view.

摘要

背景

双聚类算法旨在在基因表达数据的样本子集中寻找具有相同行为的基因组。如今,公共知识库中可用的生物学知识可用于驱动这些算法,以找到由功能相关的基因组组成的双聚类。另一方面,可以根据基因本体论(GO)中存储的信息定义基因之间的距离。基因对的GO语义相似性度量为每对基因报告一个值,该值确定它们的功能相似性。本文研究了一种基于散布搜索的算法,该算法优化了一个整合GO信息的价值函数。这个价值函数使用一个通过GO度量来处理信息的项。

结果

分析了两种可能不同的基因对GO度量对算法性能的影响。首先,研究了三个包含约一千个基因的著名酵母数据集。其次,该算法还探索了一组与癌症临床数据相关的人类数据集。这些数据大多是由大量基因组成的高维数据集。当搜索过程由所提出的GO度量之一驱动时,得到的双聚类揭示了由相同功能连接的基因组。此外,对一组双聚类的定性生物学研究表明了它们从癌症疾病角度的相关性。

结论

可以得出结论,生物信息的整合提高了双聚类过程的性能。所研究的两种不同的GO度量在酵母数据集的结果上显示出改进。然而,如果数据集由大量基因组成,只有其中一种真正提高了算法性能。第二种情况构成了从临床角度探索有趣数据集的明确选择。

相似文献

1
Pairwise gene GO-based measures for biclustering of high-dimensional expression data.基于成对基因GO的高维表达数据双聚类方法
BioData Min. 2018 Mar 27;11:4. doi: 10.1186/s13040-018-0165-9. eCollection 2018.
2
Integrating biological knowledge based on functional annotations for biclustering of gene expression data.基于功能注释整合生物学知识以进行基因表达数据的双聚类分析。
Comput Methods Programs Biomed. 2015 May;119(3):163-80. doi: 10.1016/j.cmpb.2015.02.010. Epub 2015 Mar 18.
3
DeBi: Discovering Differentially Expressed Biclusters using a Frequent Itemset Approach.DeBi:使用频繁项集方法发现差异表达的双聚类
Algorithms Mol Biol. 2011 Jun 23;6(1):18. doi: 10.1186/1748-7188-6-18.
4
Biclustering of gene expression data by correlation-based scatter search.基于相关性散列搜索的基因表达数据的双聚类。
BioData Min. 2011 Jan 24;4(1):3. doi: 10.1186/1756-0381-4-3.
5
A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data.基于基因表达数据对五种二分聚类算法的聚类质量进行定量比较和评估。
BioData Min. 2012 Jul 23;5(1):8. doi: 10.1186/1756-0381-5-8.
6
Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization.使用高效双聚类算法和并行坐标可视化技术识别基因表达数据中的连贯模式。
BMC Bioinformatics. 2008 Apr 23;9:210. doi: 10.1186/1471-2105-9-210.
7
Identifying gene-specific subgroups: an alternative to biclustering.鉴定基因特异性亚组:一种替代双聚类的方法。
BMC Bioinformatics. 2019 Dec 3;20(1):625. doi: 10.1186/s12859-019-3289-0.
8
GO functional similarity clustering depends on similarity measure, clustering method, and annotation completeness.GO 功能相似性聚类取决于相似性度量、聚类方法和注释完整性。
BMC Bioinformatics. 2019 Mar 27;20(1):155. doi: 10.1186/s12859-019-2752-2.
9
Discovering biclusters in gene expression data based on high-dimensional linear geometries.基于高维线性几何在基因表达数据中发现双簇。
BMC Bioinformatics. 2008 Apr 23;9:209. doi: 10.1186/1471-2105-9-209.
10
Differential co-expression framework to quantify goodness of biclusters and compare biclustering algorithms.用于量化双聚类质量并比较双聚类算法的差异共表达框架。
Algorithms Mol Biol. 2010 May 28;5:23. doi: 10.1186/1748-7188-5-23.

本文引用的文献

1
A systematic comparative evaluation of biclustering techniques.双聚类技术的系统比较评估
BMC Bioinformatics. 2017 Jan 23;18(1):55. doi: 10.1186/s12859-017-1487-1.
2
BiC2PAM: constraint-guided biclustering for biological data analysis with domain knowledge.BiC2PAM:利用领域知识进行生物数据分析的约束引导双聚类
Algorithms Mol Biol. 2016 Sep 14;11:23. doi: 10.1186/s13015-016-0085-5. eCollection 2016.
3
GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge.GO-PCA:一种利用先验知识探索基因表达数据的无监督方法。
PLoS One. 2015 Nov 17;10(11):e0143196. doi: 10.1371/journal.pone.0143196. eCollection 2015.
4
NCG 5.0: updates of a manually curated repository of cancer genes and associated properties from cancer mutational screenings.NCG 5.0:来自癌症突变筛查的癌症基因及相关特性的人工整理数据库的更新
Nucleic Acids Res. 2016 Jan 4;44(D1):D992-9. doi: 10.1093/nar/gkv1123. Epub 2015 Oct 29.
5
Biclustering on expression data: A review.基于表达数据的双聚类分析:综述
J Biomed Inform. 2015 Oct;57:163-80. doi: 10.1016/j.jbi.2015.06.028. Epub 2015 Jul 6.
6
Integrating biological knowledge based on functional annotations for biclustering of gene expression data.基于功能注释整合生物学知识以进行基因表达数据的双聚类分析。
Comput Methods Programs Biomed. 2015 May;119(3):163-80. doi: 10.1016/j.cmpb.2015.02.010. Epub 2015 Mar 18.
7
A framework for generalized subspace pattern mining in high-dimensional datasets.高维数据集中广义子空间模式挖掘的框架。
BMC Bioinformatics. 2014 Nov 21;15(1):355. doi: 10.1186/s12859-014-0355-5.
8
Proximity measures for clustering gene expression microarray data: a validation methodology and a comparative analysis.基因表达微阵列数据聚类的接近度度量:验证方法学和比较分析。
IEEE/ACM Trans Comput Biol Bioinform. 2013 Jul-Aug;10(4):845-57. doi: 10.1109/TCBB.2013.9.
9
A new measure for gene expression biclustering based on non-parametric correlation.基于非参数相关性的基因表达双聚类新方法
Comput Methods Programs Biomed. 2013 Dec;112(3):367-97. doi: 10.1016/j.cmpb.2013.07.025. Epub 2013 Aug 19.
10
A novel biclustering algorithm for the discovery of meaningful biological correlations between microRNAs and their target genes.一种新的双聚类算法,用于发现 microRNAs 和其靶基因之间有意义的生物学相关性。
BMC Bioinformatics. 2013;14 Suppl 7(Suppl 7):S8. doi: 10.1186/1471-2105-14-S7-S8. Epub 2013 Apr 22.