一种由先验生物学知识引导的多目标基因聚类算法，具备强化和多样化策略。

A multi-objective gene clustering algorithm guided by apriori biological knowledge with intensification and diversification strategies.

作者信息

Parraga-Alava Jorge, Dorn Marcio, Inostroza-Ponta Mario

机构信息

1Centre for Biotechnology and Bioengineering (CeBiB), Departamento de Ingeniería Informática, Universidad de Santiago de Chile, Av. Ecuador 3659, Santiago, Chile.

2Carrera de Computación, Escuela Superior Politécnica Agropecuaria de Manabí Manuel Félix López, Campus Politécnico Sitio El Limón, Calceta, Ecuador.

出版信息

BioData Min. 2018 Aug 7;11:16. doi: 10.1186/s13040-018-0178-4. eCollection 2018.

DOI:10.1186/s13040-018-0178-4

PMID:30100924

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6081857/

Abstract

BACKGROUND

Biologists aim to understand the genetic background of diseases, metabolic disorders or any other genetic condition. Microarrays are one of the main high-throughput technologies for collecting information about the behaviour of genetic information on different conditions. In order to analyse this data, clustering arises as one of the main techniques used, and it aims at finding groups of genes that have some criterion in common, like similar expression profile. However, the problem of finding groups is normally multi dimensional, making necessary to approach the clustering as a multi-objective problem where various cluster validity indexes are simultaneously optimised. They are usually based on criteria like compactness and separation, which may not be sufficient since they can not guarantee the generation of clusters that have both similar expression patterns and biological coherence.

METHOD

We propose a Multi-Objective Clustering algorithm Guided by a-Priori Biological Knowledge (MOC-GaPBK) to find clusters of genes with high levels of co-expression, biological coherence, and also good compactness and separation. Cluster quality indexes are used to optimise simultaneously gene relationships at expression level and biological functionality. Our proposal also includes intensification and diversification strategies to improve the search process.

RESULTS

The effectiveness of the proposed algorithm is demonstrated on four publicly available datasets. Comparative studies of the use of different objective functions and other widely used microarray clustering techniques are reported. Statistical, visual and biological significance tests are carried out to show the superiority of the proposed algorithm.

CONCLUSIONS

Integrating a-priori biological knowledge into a multi-objective approach and using intensification and diversification strategies allow the proposed algorithm to find solutions with higher quality than other microarray clustering techniques available in the literature in terms of co-expression, biological coherence, compactness and separation.

摘要

背景

生物学家旨在了解疾病、代谢紊乱或任何其他遗传病症的遗传背景。微阵列是用于收集有关不同条件下遗传信息行为的信息的主要高通量技术之一。为了分析这些数据，聚类成为主要使用的技术之一，其目的是找到具有某些共同标准（如相似表达谱）的基因群体。然而，寻找群体的问题通常是多维度的，这使得有必要将聚类作为一个多目标问题来处理，在这个问题中，各种聚类有效性指标会同时得到优化。它们通常基于紧凑性和分离性等标准，但这些标准可能并不充分，因为它们无法保证生成既具有相似表达模式又具有生物学连贯性的聚类。

方法

我们提出了一种由先验生物学知识引导的多目标聚类算法（MOC - GaPBK），以找到具有高共表达水平、生物学连贯性以及良好紧凑性和分离性的基因聚类。聚类质量指标用于同时优化表达水平上的基因关系和生物学功能。我们的提议还包括强化和多样化策略，以改进搜索过程。

结果

在四个公开可用的数据集上证明了所提出算法的有效性。报告了对不同目标函数的使用以及其他广泛使用的微阵列聚类技术的比较研究。进行了统计、可视化和生物学意义测试，以显示所提出算法的优越性。

结论

将先验生物学知识整合到多目标方法中，并使用强化和多样化策略，使得所提出的算法能够找到比文献中其他可用的微阵列聚类技术在共表达、生物学连贯性、紧凑性和分离性方面质量更高的解决方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b5d/6081857/282858f98af7/13040_2018_178_Fig1_HTML.jpg

相似文献

A multi-objective gene clustering algorithm guided by apriori biological knowledge with intensification and diversification strategies.一种由先验生物学知识引导的多目标基因聚类算法，具备强化和多样化策略。

BioData Min. 2018 Aug 7;11:16. doi: 10.1186/s13040-018-0178-4. eCollection 2018.

Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes.使用监督学习组合帕累托最优聚类以识别共表达基因。

BMC Bioinformatics. 2009 Jan 20;10:27. doi: 10.1186/1471-2105-10-27.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Influence of the go-based semantic similarity measures in multi-objective gene clustering algorithm performance.基于 GO 的语义相似度度量对多目标基因聚类算法性能的影响。

J Bioinform Comput Biol. 2020 Dec;18(6):2050038. doi: 10.1142/S0219720020500389. Epub 2020 Nov 5.

Biclustering of microarray data with MOSPO based on crowding distance.基于拥挤距离使用MOSPO对微阵列数据进行双聚类分析。

BMC Bioinformatics. 2009 Apr 29;10 Suppl 4(Suppl 4):S9. doi: 10.1186/1471-2105-10-S4-S9.

Metric for measuring the effectiveness of clustering of DNA microarray expression.用于测量 DNA 微阵列表达聚类有效性的度量。

BMC Bioinformatics. 2006 Sep 6;7 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-7-S2-S5.

Knowledge-assisted recognition of cluster boundaries in gene expression data.基因表达数据中聚类边界的知识辅助识别。

Artif Intell Med. 2005 Sep-Oct;35(1-2):171-83. doi: 10.1016/j.artmed.2005.02.007.

A new unsupervised gene clustering algorithm based on the integration of biological knowledge into expression data.一种新的基于将生物学知识整合到表达数据中的无监督基因聚类算法。

BMC Bioinformatics. 2013 Feb 7;14:42. doi: 10.1186/1471-2105-14-42.

Use of Semisupervised Clustering and Feature-Selection Techniques for Identification of Co-expressed Genes.使用半监督聚类和特征选择技术识别共表达基因。

IEEE J Biomed Health Inform. 2016 Jul;20(4):1171-7. doi: 10.1109/JBHI.2015.2451735. Epub 2015 Jul 20.

Fusion of expression values and protein interaction information using multi-objective optimization for improving gene clustering.使用多目标优化融合表达值和蛋白质相互作用信息以改进基因聚类。

Comput Biol Med. 2017 Oct 1;89:31-43. doi: 10.1016/j.compbiomed.2017.07.015. Epub 2017 Aug 1.

引用本文的文献

A hybrid multi-objective whale optimization algorithm for analyzing microarray data based on Apache Spark.一种基于Apache Spark的用于分析微阵列数据的混合多目标鲸鱼优化算法。

PeerJ Comput Sci. 2021 Mar 25;7:e416. doi: 10.7717/peerj-cs.416. eCollection 2021.

RoCoLe: A coffee leaf images dataset for evaluation of machine learning based methods in plant diseases recognition.RoCoLe：一个用于评估基于机器学习的植物病害识别方法的咖啡叶图像数据集。

Data Brief. 2019 Aug 19;25:104414. doi: 10.1016/j.dib.2019.104414. eCollection 2019 Aug.

本文引用的文献

Functional grouping of similar genes using eigenanalysis on minimum spanning tree based neighborhood graph.基于最小生成树邻域图的特征分析对相似基因进行功能分组。

Comput Biol Med. 2016 Apr 1;71:135-48. doi: 10.1016/j.compbiomed.2016.02.007. Epub 2016 Feb 21.

KEGG as a reference resource for gene and protein annotation.KEGG作为基因和蛋白质注释的参考资源。

Nucleic Acids Res. 2016 Jan 4;44(D1):D457-62. doi: 10.1093/nar/gkv1070. Epub 2015 Oct 17.

Use of Semisupervised Clustering and Feature-Selection Techniques for Identification of Co-expressed Genes.使用半监督聚类和特征选择技术识别共表达基因。

IEEE J Biomed Health Inform. 2016 Jul;20(4):1171-7. doi: 10.1109/JBHI.2015.2451735. Epub 2015 Jul 20.

DECODE: an integrated differential co-expression and differential expression analysis of gene expression data.DECODE：基因表达数据的综合差异共表达和差异表达分析

BMC Bioinformatics. 2015 May 31;16:182. doi: 10.1186/s12859-015-0582-4.

Integrating biological knowledge based on functional annotations for biclustering of gene expression data.基于功能注释整合生物学知识以进行基因表达数据的双聚类分析。

Comput Methods Programs Biomed. 2015 May;119(3):163-80. doi: 10.1016/j.cmpb.2015.02.010. Epub 2015 Mar 18.

ValWorkBench: an open source Java library for cluster validation, with applications to microarray data analysis.ValWorkBench：一个用于聚类验证的开源Java库，应用于微阵列数据分析。

Comput Methods Programs Biomed. 2015 Feb;118(2):207-17. doi: 10.1016/j.cmpb.2014.12.004. Epub 2015 Jan 2.

Gene Ontology Consortium: going forward.基因本体论联盟：展望未来。

Nucleic Acids Res. 2015 Jan;43(Database issue):D1049-56. doi: 10.1093/nar/gku1179. Epub 2014 Nov 26.

Inferring gene ontologies from pairwise similarity data.从成对相似性数据推断基因本体论。

Bioinformatics. 2014 Jun 15;30(12):i34-42. doi: 10.1093/bioinformatics/btu282.

Annotation enrichment analysis: an alternative method for evaluating the functional properties of gene sets.注释富集分析：一种评估基因集功能特性的替代方法。

Sci Rep. 2014 Feb 26;4:4191. doi: 10.1038/srep04191.

Statistical analysis of differential gene expression relative to a fold change threshold on NanoString data of mouse odorant receptor genes.对小鼠嗅觉受体基因的 NanoString 数据进行相对于折叠变化阈值的差异基因表达的统计分析。

BMC Bioinformatics. 2014 Feb 4;15:39. doi: 10.1186/1471-2105-15-39.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种由先验生物学知识引导的多目标基因聚类算法，具备强化和多样化策略。

A multi-objective gene clustering algorithm guided by apriori biological knowledge with intensification and diversification strategies.

作者信息

机构信息

出版信息

BACKGROUND

METHOD

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献