基于系统抽样和层次聚类算法的基因芯片表达数据中显著基因的挖掘。

Digging for Significant Genes in Microarray Expression Data Based on Systematic Sampling and Hierarchal Clustering Algorithm.

机构信息

University of Sulaimani, Collage of Science, Computer Department, Sulaymaniyah, Iraq.

出版信息

Adv Exp Med Biol. 2021;1338:1-6. doi: 10.1007/978-3-030-78775-2_1.

Abstract

Obesity is a worldwide health problem. Eating habits have changed during this decade and an increase in high-fat foods as well as sugar intake has been observed, which is associated with obesity and weight gaining. Therefore, in this chapter, we have analysed microarray expression data for obese and lean individuals. The microarray technology simultaneously records the expression levels of thousands of genes across related samples and during biological process. The microarray data sets are enriched with crucial information which have to be examined. In the study discussed in this chapter, the microarray data sets are pre-processed prior to analysis, in which upregulated and downregulated gene groups have been identified. Clustering is one of the learning techniques and it is applied in different fields of study. Clustering with microarray data can be accomplished based on genes or samples and depending on the type of datasets. Hierarchal clustering algorithm was used to detect gene patterns in our candidate datasets, since microarray data are considered big and complex. Systematic sampling technique was used to reduce the complexity of microarray datasets and to enhance the clustering quality. This technique is a simple and conductive sampling technique. The proposed algorithm, that is, Systematic Sampling with Hierarchal Clustering (SSHC), could detect significant gene patterns in the datasets, and the proposed system (SSHC) shows a better performance. The validity index utilized to evaluate the SSHC algorithm is adjusted Rand index (ARI).

摘要

肥胖是一个全球性的健康问题。在这十年中，饮食习惯发生了变化，高脂肪食物以及糖的摄入量增加，这与肥胖和体重增加有关。因此，在本章中，我们分析了肥胖和正常个体的基因表达数据。基因表达谱芯片技术可以同时记录数千个相关样本在生物过程中的表达水平。基因表达谱芯片数据集中包含了大量需要分析的关键信息。在本章中讨论的研究中，在进行分析之前，对基因表达谱芯片数据集进行了预处理，确定了上调和下调的基因群。聚类是一种学习技术，应用于不同的研究领域。基于基因或样本，可以根据数据集的类型对基因表达谱芯片数据进行聚类。由于基因表达谱芯片数据通常较大且复杂，因此使用层次聚类算法来检测我们的候选数据集的基因模式。系统抽样技术用于降低基因表达谱芯片数据集的复杂性并提高聚类质量。该技术是一种简单而有效的抽样技术。提出的算法，即层次聚类的系统抽样（SSHC），可以检测数据集中的显著基因模式，并且所提出的系统（SSHC）显示出更好的性能。用于评估 SSHC 算法的有效性指标是调整 Rand 指数（ARI）。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于系统抽样和层次聚类算法的基因芯片表达数据中显著基因的挖掘。

Digging for Significant Genes in Microarray Expression Data Based on Systematic Sampling and Hierarchal Clustering Algorithm.

机构信息

出版信息

相似文献

基于系统抽样和层次聚类算法的基因芯片表达数据中显著基因的挖掘。

Digging for Significant Genes in Microarray Expression Data Based on Systematic Sampling and Hierarchal Clustering Algorithm.

机构信息

出版信息

相似文献