• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种新的基于将生物学知识整合到表达数据中的无监督基因聚类算法。

A new unsupervised gene clustering algorithm based on the integration of biological knowledge into expression data.

机构信息

Applied Mathematics Department, Agrocampus Ouest, 65, rue de Saint-Brieuc, Rennes, France.

出版信息

BMC Bioinformatics. 2013 Feb 7;14:42. doi: 10.1186/1471-2105-14-42.

DOI:10.1186/1471-2105-14-42
PMID:23387364
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3635920/
Abstract

BACKGROUND

Gene clustering algorithms are massively used by biologists when analysing omics data. Classical gene clustering strategies are based on the use of expression data only, directly as in Heatmaps, or indirectly as in clustering based on coexpression networks for instance. However, the classical strategies may not be sufficient to bring out all potential relationships amongst genes.

RESULTS

We propose a new unsupervised gene clustering algorithm based on the integration of external biological knowledge, such as Gene Ontology annotations, into expression data. We introduce a new distance between genes which consists in integrating biological knowledge into the analysis of expression data. Therefore, two genes are close if they have both similar expression profiles and similar functional profiles at once. Then a classical algorithm (e.g. K-means) is used to obtain gene clusters. In addition, we propose an automatic evaluation procedure of gene clusters. This procedure is based on two indicators which measure the global coexpression and biological homogeneity of gene clusters. They are associated with hypothesis testing which allows to complement each indicator with a p-value.Our clustering algorithm is compared to the Heatmap clustering and the clustering based on gene coexpression network, both on simulated and real data. In both cases, it outperforms the other methodologies as it provides the highest proportion of significantly coexpressed and biologically homogeneous gene clusters, which are good candidates for interpretation.

CONCLUSION

Our new clustering algorithm provides a higher proportion of good candidates for interpretation. Therefore, we expect the interpretation of these clusters to help biologists to formulate new hypothesis on the relationships amongst genes.

摘要

背景

当分析组学数据时,生物学家大量使用基因聚类算法。经典的基因聚类策略基于仅使用表达数据,直接如热图,或间接如基于共表达网络的聚类。然而,经典策略可能不足以揭示基因之间的所有潜在关系。

结果

我们提出了一种新的无监督基因聚类算法,该算法基于将外部生物学知识(如基因本体论注释)集成到表达数据中。我们引入了一种新的基因间距离,它将生物学知识纳入表达数据分析中。因此,如果两个基因具有相似的表达谱和相似的功能谱,则它们就很接近。然后使用经典算法(例如 K-means)来获得基因簇。此外,我们提出了一种基因簇的自动评估程序。该程序基于两个指标,用于衡量基因簇的全局共表达和生物学同质性。它们与假设检验相关联,可以用 p 值补充每个指标。我们的聚类算法与热图聚类和基于基因共表达网络的聚类在模拟和真实数据上进行了比较。在这两种情况下,它都优于其他方法,因为它提供了更高比例的显著共表达和生物学同质性的基因簇,这些簇是解释的良好候选者。

结论

我们的新聚类算法提供了更高比例的解释良好的候选者。因此,我们期望对这些簇的解释能够帮助生物学家提出关于基因之间关系的新假设。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e934/3635920/b8d23279abac/1471-2105-14-42-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e934/3635920/48c364584561/1471-2105-14-42-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e934/3635920/4d63912a4db3/1471-2105-14-42-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e934/3635920/833216ce94e7/1471-2105-14-42-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e934/3635920/b8d23279abac/1471-2105-14-42-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e934/3635920/48c364584561/1471-2105-14-42-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e934/3635920/4d63912a4db3/1471-2105-14-42-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e934/3635920/833216ce94e7/1471-2105-14-42-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e934/3635920/b8d23279abac/1471-2105-14-42-4.jpg

相似文献

1
A new unsupervised gene clustering algorithm based on the integration of biological knowledge into expression data.一种新的基于将生物学知识整合到表达数据中的无监督基因聚类算法。
BMC Bioinformatics. 2013 Feb 7;14:42. doi: 10.1186/1471-2105-14-42.
2
Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.使用功能类别参考集评估基因表达数据聚类算法的方法。
BMC Bioinformatics. 2006 Aug 31;7:397. doi: 10.1186/1471-2105-7-397.
3
Evaluation of clustering algorithms for gene expression data using gene ontology annotations.基于基因本体论注释评估基因表达数据的聚类算法。
Chin Med J (Engl). 2012 Sep;125(17):3048-52.
4
Knowledge-assisted recognition of cluster boundaries in gene expression data.基因表达数据中聚类边界的知识辅助识别。
Artif Intell Med. 2005 Sep-Oct;35(1-2):171-83. doi: 10.1016/j.artmed.2005.02.007.
5
Comparisons of graph-structure clustering methods for gene expression data.基因表达数据的图结构聚类方法比较。
Acta Biochim Biophys Sin (Shanghai). 2006 Jun;38(6):379-84. doi: 10.1111/j.1745-7270.2006.00175.x.
6
Integrating biological knowledge based on functional annotations for biclustering of gene expression data.基于功能注释整合生物学知识以进行基因表达数据的双聚类分析。
Comput Methods Programs Biomed. 2015 May;119(3):163-80. doi: 10.1016/j.cmpb.2015.02.010. Epub 2015 Mar 18.
7
A phase synchronization clustering algorithm for identifying interesting groups of genes from cell cycle expression data.一种用于从细胞周期表达数据中识别有趣基因组的相位同步聚类算法。
BMC Bioinformatics. 2008 Jan 28;9:56. doi: 10.1186/1471-2105-9-56.
8
Clustering gene expression data using a diffraction-inspired framework.基于衍射启发式框架的基因表达数据聚类。
Biomed Eng Online. 2012 Nov 19;11:85. doi: 10.1186/1475-925X-11-85.
9
A new algorithm for comparing and visualizing relationships between hierarchical and flat gene expression data clusterings.一种用于比较和可视化层次化与平面化基因表达数据聚类之间关系的新算法。
Bioinformatics. 2005 Nov 1;21(21):3993-9. doi: 10.1093/bioinformatics/bti644. Epub 2005 Sep 1.
10
ClusterMine: A knowledge-integrated clustering approach based on expression profiles of gene sets.ClusterMine:一种基于基因集表达谱的知识整合聚类方法。
J Bioinform Comput Biol. 2020 Jun;18(3):2040009. doi: 10.1142/S0219720020400090.

引用本文的文献

1
Integrative clustering methods for multi-omics data.多组学数据的整合聚类方法。
Wiley Interdiscip Rev Comput Stat. 2022 May-Jun;14(3). doi: 10.1002/wics.1553. Epub 2021 Feb 7.
2
Pairwise gene GO-based measures for biclustering of high-dimensional expression data.基于成对基因GO的高维表达数据双聚类方法
BioData Min. 2018 Mar 27;11:4. doi: 10.1186/s13040-018-0165-9. eCollection 2018.
3
Semantic biclustering for finding local, interpretable and predictive expression patterns.语义二分聚类用于发现局部、可解释和可预测的表达模式。

本文引用的文献

1
Predictive integration of gene functional similarity and co-expression defines treatment response of endothelial progenitor cells.基因功能相似性与共表达的预测性整合确定了内皮祖细胞的治疗反应。
BMC Syst Biol. 2011 Mar 30;5:46. doi: 10.1186/1752-0509-5-46.
2
GOing Bayesian: model-based gene set analysis of genome-scale data.GOing Bayesian:基于模型的全基因组数据基因集分析。
Nucleic Acids Res. 2010 Jun;38(11):3523-32. doi: 10.1093/nar/gkq045. Epub 2010 Feb 19.
3
Simultaneous inference of biological networks of multiple species from genome-wide data and evolutionary information: a semi-supervised approach.
BMC Genomics. 2017 Oct 16;18(Suppl 7):752. doi: 10.1186/s12864-017-4132-5.
从全基因组数据和进化信息推断多个物种的生物网络:一种半监督方法。
Bioinformatics. 2009 Nov 15;25(22):2962-8. doi: 10.1093/bioinformatics/btp494. Epub 2009 Aug 17.
4
Transcriptome profiling of the feeding-to-fasting transition in chicken liver.鸡肝脏从进食到禁食转变过程中的转录组分析。
BMC Genomics. 2008 Dec 17;9:611. doi: 10.1186/1471-2164-9-611.
5
A multivariate analysis approach to the integration of proteomic and gene expression data.一种用于整合蛋白质组学和基因表达数据的多变量分析方法。
Proteomics. 2007 Jun;7(13):2162-71. doi: 10.1002/pmic.200600898.
6
Systematic gene function prediction from gene expression data by using a fuzzy nearest-cluster method.利用模糊最近邻聚类方法从基因表达数据中进行系统的基因功能预测。
BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S23. doi: 10.1186/1471-2105-7-S4-S23.
7
Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.使用功能类别参考集评估基因表达数据聚类算法的方法。
BMC Bioinformatics. 2006 Aug 31;7:397. doi: 10.1186/1471-2105-7-397.
8
A general framework for weighted gene co-expression network analysis.加权基因共表达网络分析的通用框架。
Stat Appl Genet Mol Biol. 2005;4:Article17. doi: 10.2202/1544-6115.1128. Epub 2005 Aug 12.
9
Integration of GO annotations in Correspondence Analysis: facilitating the interpretation of microarray data.在对应分析中整合基因本体论注释:助力微阵列数据的解读
Bioinformatics. 2005 May 15;21(10):2424-9. doi: 10.1093/bioinformatics/bti367. Epub 2005 Mar 3.
10
Reverse engineering gene networks using singular value decomposition and robust regression.使用奇异值分解和稳健回归对基因网络进行逆向工程。
Proc Natl Acad Sci U S A. 2002 Apr 30;99(9):6163-8. doi: 10.1073/pnas.092576199.