基于可视化的癌症微阵列数据分类分析

Visualization-based cancer microarray data classification analysis.

作者信息

Mramor Minca, Leban Gregor, Demsar Janez, Zupan Blaz

机构信息

Faculty of Computer and Information Science, University of Ljubljana, Trzaska 25, 1000 Ljubljana, Slovenia.

出版信息

Bioinformatics. 2007 Aug 15;23(16):2147-54. doi: 10.1093/bioinformatics/btm312. Epub 2007 Jun 22.

DOI:10.1093/bioinformatics/btm312

PMID:17586552

Abstract

MOTIVATION

Methods for analyzing cancer microarray data often face two distinct challenges: the models they infer need to perform well when classifying new tissue samples while at the same time providing an insight into the patterns and gene interactions hidden in the data. State-of-the-art supervised data mining methods often cover well only one of these aspects, motivating the development of methods where predictive models with a solid classification performance would be easily communicated to the domain expert.

RESULTS

Data visualization may provide for an excellent approach to knowledge discovery and analysis of class-labeled data. We have previously developed an approach called VizRank that can score and rank point-based visualizations according to degree of separation of data instances of different class. We here extend VizRank with techniques to uncover outliers, score features (genes) and perform classification, as well as to demonstrate that the proposed approach is well suited for cancer microarray analysis. Using VizRank and radviz visualization on a set of previously published cancer microarray data sets, we were able to find simple, interpretable data projections that include only a small subset of genes yet do clearly differentiate among different cancer types. We also report that our approach to classification through visualization achieves performance that is comparable to state-of-the-art supervised data mining techniques.

AVAILABILITY

VizRank and radviz are implemented as part of the Orange data mining suite (http://www.ailab.si/orange).

SUPPLEMENTARY INFORMATION

Supplementary data are available from http://www.ailab.si/supp/bi-cancer.

摘要

动机

分析癌症微阵列数据的方法通常面临两个不同的挑战：它们推断出的模型在对新的组织样本进行分类时需要表现良好，同时还要深入了解数据中隐藏的模式和基因相互作用。当前最先进的监督数据挖掘方法往往只能很好地涵盖其中一个方面，这促使人们开发出一种方法，使具有可靠分类性能的预测模型能够轻松地与领域专家进行交流。

结果

数据可视化可能为带类标签数据的知识发现和分析提供一种出色的方法。我们之前开发了一种名为VizRank的方法，它可以根据不同类数据实例的分离程度对基于点的可视化进行评分和排序。我们在此对VizRank进行扩展，加入了用于发现异常值、对特征（基因）进行评分和执行分类的技术，并证明所提出的方法非常适合癌症微阵列分析。使用VizRank和radviz可视化方法对一组先前发表的癌症微阵列数据集进行分析，我们能够找到简单、可解释的数据投影，这些投影只包含一小部分基因，但却能清晰地区分不同的癌症类型。我们还报告称，我们通过可视化进行分类的方法所取得的性能与当前最先进的监督数据挖掘技术相当。

可用性

VizRank和radviz作为Orange数据挖掘套件（http://www.ailab.si/orange）的一部分实现。

补充信息

补充数据可从http://www.ailab.si/supp/bi-cancer获取。

相似文献

Visualization-based cancer microarray data classification analysis.基于可视化的癌症微阵列数据分类分析

Bioinformatics. 2007 Aug 15;23(16):2147-54. doi: 10.1093/bioinformatics/btm312. Epub 2007 Jun 22.

Large scale data mining approach for gene-specific standardization of microarray gene expression data.用于微阵列基因表达数据基因特异性标准化的大规模数据挖掘方法。

Bioinformatics. 2006 Dec 1;22(23):2898-904. doi: 10.1093/bioinformatics/btl500. Epub 2006 Oct 10.

Meta-analysis of gene expression data: a predictor-based approach.基因表达数据的荟萃分析：一种基于预测因子的方法。

Bioinformatics. 2007 Jul 1;23(13):1599-606. doi: 10.1093/bioinformatics/btm149. Epub 2007 Apr 26.

An insight-based methodology for evaluating bioinformatics visualizations.一种基于洞察的生物信息学可视化评估方法。

IEEE Trans Vis Comput Graph. 2005 Jul-Aug;11(4):443-56. doi: 10.1109/TVCG.2005.53.

Pathway recognition and augmentation by computational analysis of microarray expression data.通过微阵列表达数据的计算分析进行通路识别与增强

Bioinformatics. 2006 Jan 15;22(2):233-41. doi: 10.1093/bioinformatics/bti764. Epub 2005 Nov 8.

Classification of microarray data with factor mixture models.基于因子混合模型的微阵列数据分类

Bioinformatics. 2006 Jan 15;22(2):202-8. doi: 10.1093/bioinformatics/bti779. Epub 2005 Nov 15.

VAMP: visualization and analysis of array-CGH, transcriptome and other molecular profiles.VAMP：阵列比较基因组杂交、转录组及其他分子图谱的可视化与分析

Bioinformatics. 2006 Sep 1;22(17):2066-73. doi: 10.1093/bioinformatics/btl359. Epub 2006 Jul 4.

FreeViz--an intelligent multivariate visualization approach to explorative analysis of biomedical data.FreeViz——一种用于生物医学数据探索性分析的智能多变量可视化方法。

J Biomed Inform. 2007 Dec;40(6):661-71. doi: 10.1016/j.jbi.2007.03.010. Epub 2007 Apr 20.

Sorting points into neighborhoods (SPIN): data analysis and visualization by ordering distance matrices.将点分类到邻域中（SPIN）：通过对距离矩阵排序进行数据分析和可视化

Bioinformatics. 2005 May 15;21(10):2301-8. doi: 10.1093/bioinformatics/bti329. Epub 2005 Feb 18.

BlotBase: a northern blot database.印迹数据库：一个Northern印迹数据库。

Gene. 2008 Dec 31;427(1-2):47-50. doi: 10.1016/j.gene.2008.08.026. Epub 2008 Sep 18.

引用本文的文献

Random k conditional nearest neighbor for high-dimensional data.用于高维数据的随机k条件最近邻

PeerJ Comput Sci. 2025 Jan 24;11:e2497. doi: 10.7717/peerj-cs.2497. eCollection 2025.

Performance enhancement of classifiers through Bio inspired feature selection methods for early detection of lung cancer from microarray genes.通过受生物启发的特征选择方法提高分类器性能，用于从微阵列基因中早期检测肺癌。

Heliyon. 2024 Aug 17;10(16):e36419. doi: 10.1016/j.heliyon.2024.e36419. eCollection 2024 Aug 30.

Enhancement of Classifier Performance with Adam and RanAdam Hyper-Parameter Tuning for Lung Cancer Detection from Microarray Data-In Pursuit of Precision.通过Adam和RanAdam超参数调优提高从微阵列数据检测肺癌的分类器性能——追求精准度

Bioengineering (Basel). 2024 Mar 26;11(4):314. doi: 10.3390/bioengineering11040314.

Preoperative assessment of grade, T stage, and lymph node involvement: machine learning-based CT texture analysis in colon cancer.术前分级、T 分期和淋巴结受累评估：基于机器学习的结肠癌 CT 纹理分析。

Jpn J Radiol. 2024 Mar;42(3):300-307. doi: 10.1007/s11604-023-01502-2. Epub 2023 Oct 24.

Evaluation and Exploration of Machine Learning and Convolutional Neural Network Classifiers in Detection of Lung Cancer from Microarray Gene-A Paradigm Shift.机器学习和卷积神经网络分类器在微阵列基因检测肺癌中的评估与探索——一种范式转变

Bioengineering (Basel). 2023 Aug 6;10(8):933. doi: 10.3390/bioengineering10080933.

Integrative Analysis of Incongruous Cancer Genomics and Proteomics Datasets.整合分析矛盾的癌症基因组学和蛋白质组学数据集。

Methods Mol Biol. 2021;2361:291-305. doi: 10.1007/978-1-0716-1641-3_17.

Error curves for evaluating the quality of feature rankings.用于评估特征排名质量的误差曲线。

PeerJ Comput Sci. 2020 Dec 7;6:e310. doi: 10.7717/peerj-cs.310. eCollection 2020.

A Neural Network Framework for Predicting the Tissue-of-Origin of 15 Common Cancer Types Based on RNA-Seq Data.一种基于RNA测序数据预测15种常见癌症类型组织起源的神经网络框架。

Front Bioeng Biotechnol. 2020 Aug 5;8:737. doi: 10.3389/fbioe.2020.00737. eCollection 2020.

Weighted gene expression profiles identify diagnostic and prognostic genes for lung adenocarcinoma and squamous cell carcinoma.加权基因表达谱可识别肺腺癌和肺鳞状细胞癌的诊断及预后相关基因。

J Int Med Res. 2020 Mar;48(3):300060519893837. doi: 10.1177/0300060519893837. Epub 2019 Dec 19.

Construction of subtype-specific prognostic gene signatures for early-stage non-small cell lung cancer using meta feature selection methods.使用元特征选择方法构建早期非小细胞肺癌的亚型特异性预后基因特征。

Oncol Lett. 2019 Sep;18(3):2366-2375. doi: 10.3892/ol.2019.10563. Epub 2019 Jul 4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于可视化的癌症微阵列数据分类分析

Visualization-based cancer microarray data classification analysis.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

SUPPLEMENTARY INFORMATION

动机

结果

可用性

补充信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献