Suppr超能文献

星形细胞瘤表型的特征基因选择与功能分析:一项比较研究

Signature Genes Selection and Functional Analysis of Astrocytoma Phenotypes: A Comparative Study.

作者信息

Drozdz Anna, McInerney Caitriona E, Prise Kevin M, Spence Veronica J, Sousa Jose

机构信息

Sano-Centre for Computational Personalised Medicine-International Research Foundation, Czarnowiejska 36, 30-054 Kraków, Poland.

Patrick G. Johnson Centre for Cancer Research, Queen's University Belfast, BT9 7AE Belfast, Ireland.

出版信息

Cancers (Basel). 2024 Sep 25;16(19):3263. doi: 10.3390/cancers16193263.

Abstract

Novel cancer biomarkers discoveries are driven by the application of omics technologies. The vast quantity of highly dimensional data necessitates the implementation of feature selection. The mathematical basis of different selection methods varies considerably, which may influence subsequent inferences. In the study, feature selection and classification methods were employed to identify six signature gene sets of grade 2 and 3 astrocytoma samples from the Rembrandt repository. Subsequently, the impact of these variables on classification and further discovery of biological patterns was analysed. Principal component analysis (PCA), uniform manifold approximation and projection (UMAP), and hierarchical clustering revealed that the data set (10,096 genes) exhibited a high degree of noise, feature redundancy, and lack of distinct patterns. The application of feature selection methods resulted in a reduction in the number of genes to between 28 and 128. Notably, no single gene was selected by all of the methods tested. Selection led to an increase in classification accuracy and noise reduction. Significant differences in the Gene Ontology terms were discovered, with only 13 terms overlapping. One selection method did not result in any enriched terms. KEGG pathway analysis revealed only one pathway in common (cell cycle), while the two methods did not yield any enriched pathways. The results demonstrated a significant difference in outcomes when classification-type algorithms were utilised in comparison to mixed types (selection and classification). This may result in the inadvertent omission of biological phenomena, while simultaneously achieving enhanced classification outcomes.

摘要

新型癌症生物标志物的发现是由组学技术的应用推动的。大量的高维数据需要进行特征选择。不同选择方法的数学基础差异很大,这可能会影响后续的推断。在这项研究中,采用特征选择和分类方法从伦勃朗数据库中识别出二级和三级星形细胞瘤样本的六个特征基因集。随后,分析了这些变量对分类以及生物模式进一步发现的影响。主成分分析(PCA)、均匀流形逼近与投影(UMAP)和层次聚类表明,数据集(10,096个基因)表现出高度的噪声、特征冗余且缺乏明显模式。特征选择方法的应用使基因数量减少到28至128个之间。值得注意的是,所有测试方法都未选择单个基因。选择导致分类准确率提高且噪声降低。发现基因本体术语存在显著差异,只有13个术语重叠。一种选择方法未产生任何富集术语。KEGG通路分析仅揭示了一条共同通路(细胞周期),而两种方法均未产生任何富集通路。结果表明,与混合类型(选择和分类)相比,使用分类类型算法时结果存在显著差异。这可能会导致不经意间遗漏生物学现象,同时实现增强的分类结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4613/11476064/fee427d7a880/cancers-16-03263-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验