• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

研究非线性降维方法在基因和蛋白质表达研究分类中的有效性。

Investigating the efficacy of nonlinear dimensionality reduction schemes in classifying gene and protein expression studies.

作者信息

Lee George, Rodriguez Carlos, Madabhushi Anant

机构信息

Department of Biomedical Engineering, Rutgers The State University of New Jersey, 599 Taylor Road, Piscatway, NJ 08854, USA.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2008 Jul-Sep;5(3):368-84. doi: 10.1109/TCBB.2008.36.

DOI:10.1109/TCBB.2008.36
PMID:18670041
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2562675/
Abstract

The recent explosion in procurement and availability of high-dimensional gene- and protein-expression profile datasets for cancer diagnostics has necessitated the development of sophisticated machine learning tools with which to analyze them. A major limitation in the ability to accurate classify these high-dimensional datasets stems from the 'curse of dimensionality', occurring in situations where the number of genes or peptides significantly exceeds the total number of patient samples. Previous attempts at dealing with this issue have mostly centered on the use of a dimensionality reduction (DR) scheme, Principal Component Analysis (PCA), to obtain a low-dimensional projection of the high-dimensional data. However, linear PCA and other linear DR methods, which rely on Euclidean distances to estimate object similarity, do not account for the inherent underlying nonlinear structure associated with most biomedical data. The motivation behind this work is to identify the appropriate DR methods for analysis of high-dimensional gene- and protein-expression studies. Towards this end, we empirically and rigorously compare three nonlinear (Isomap, Locally Linear Embedding, Laplacian Eigenmaps) and three linear DR schemes (PCA, Linear Discriminant Analysis, Multidimensional Scaling) with the intent of determining a reduced subspace representation in which the individual object classes are more easily discriminable.

摘要

近期,用于癌症诊断的高维基因和蛋白质表达谱数据集在采购和可得性方面激增,这就需要开发复杂的机器学习工具来对其进行分析。准确分类这些高维数据集能力的一个主要限制源于“维度诅咒”,这种情况发生在基因或肽的数量显著超过患者样本总数时。以往处理这个问题的尝试大多集中在使用降维(DR)方案,即主成分分析(PCA),来获得高维数据的低维投影。然而,线性PCA和其他依赖欧几里得距离来估计对象相似度的线性DR方法,并未考虑与大多数生物医学数据相关的内在潜在非线性结构。这项工作的动机是确定用于分析高维基因和蛋白质表达研究的合适DR方法。为此,我们通过实证和严格比较三种非线性(等距映射、局部线性嵌入、拉普拉斯特征映射)和三种线性DR方案(PCA、线性判别分析、多维缩放),旨在确定一个降维子空间表示,其中各个对象类别更易于区分。

相似文献

1
Investigating the efficacy of nonlinear dimensionality reduction schemes in classifying gene and protein expression studies.研究非线性降维方法在基因和蛋白质表达研究分类中的有效性。
IEEE/ACM Trans Comput Biol Bioinform. 2008 Jul-Sep;5(3):368-84. doi: 10.1109/TCBB.2008.36.
2
Exploring nonlinear feature space dimension reduction and data representation in breast Cadx with Laplacian eigenmaps and t-SNE.探讨基于拉普拉斯特征映射和 t-SNE 的乳腺 CADx 非线性特征空间降维和数据表示。
Med Phys. 2010 Jan;37(1):339-51. doi: 10.1118/1.3267037.
3
Incremental nonlinear dimensionality reduction by manifold learning.基于流形学习的增量非线性降维
IEEE Trans Pattern Anal Mach Intell. 2006 Mar;28(3):377-91. doi: 10.1109/TPAMI.2006.56.
4
Consensus embedding: theory, algorithms and application to segmentation and classification of biomedical data.共识嵌入:理论、算法及其在生物医学数据分割和分类中的应用。
BMC Bioinformatics. 2012 Feb 8;13:26. doi: 10.1186/1471-2105-13-26.
5
Variable importance in nonlinear kernels (VINK): classification of digitized histopathology.非线性核中的变量重要性(VINK):数字化组织病理学分类
Med Image Comput Comput Assist Interv. 2013;16(Pt 2):238-45. doi: 10.1007/978-3-642-40763-5_30.
6
Comparative analysis of nonlinear dimensionality reduction techniques for breast MRI segmentation.比较分析用于乳腺 MRI 分割的非线性降维技术。
Med Phys. 2012 Apr;39(4):2275-89. doi: 10.1118/1.3682173.
7
Spectral embedding finds meaningful (relevant) structure in image and microarray data.谱嵌入可在图像和微阵列数据中找到有意义(相关)的结构。
BMC Bioinformatics. 2006 Feb 16;7:74. doi: 10.1186/1471-2105-7-74.
8
Nonlinear Dimensionality Reduction by Minimum Curvilinearity for Unsupervised Discovery of Patterns in Multidimensional Proteomic Data.基于最小曲率的非线性降维用于多维蛋白质组学数据模式的无监督发现
Methods Mol Biol. 2016;1384:289-98. doi: 10.1007/978-1-4939-3255-9_16.
9
Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data.非监督降维技术在微阵列基因表达数据可视化中的比较研究。
BMC Bioinformatics. 2010 Nov 18;11:567. doi: 10.1186/1471-2105-11-567.
10
Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction.微阵列数据分类的系统基准测试:评估非线性和降维的作用。
Bioinformatics. 2004 Nov 22;20(17):3185-95. doi: 10.1093/bioinformatics/bth383. Epub 2004 Jul 1.

引用本文的文献

1
Enhancement of Classifier Performance with Adam and RanAdam Hyper-Parameter Tuning for Lung Cancer Detection from Microarray Data-In Pursuit of Precision.通过Adam和RanAdam超参数调优提高从微阵列数据检测肺癌的分类器性能——追求精准度
Bioengineering (Basel). 2024 Mar 26;11(4):314. doi: 10.3390/bioengineering11040314.
2
Development of Supervised Learning Predictive Models for Highly Non-linear Biological, Biomedical, and General Datasets.针对高度非线性的生物学、生物医学及通用数据集的监督学习预测模型的开发。
Front Mol Biosci. 2020 Feb 13;7:13. doi: 10.3389/fmolb.2020.00013. eCollection 2020.
3
Performance comparison of dimensionality reduction methods on RNA-Seq data from the GTEx project.基于 GTEx 项目 RNA-Seq 数据的降维方法性能比较。
Genes Genomics. 2020 Feb;42(2):225-234. doi: 10.1007/s13258-019-00896-6. Epub 2019 Dec 12.
4
Usefulness of serum microRNA as a predictive marker of recurrence and prognosis in biliary tract cancer after radical surgery.血清 microRNA 作为根治性手术后胆道癌复发和预后的预测标志物的实用性。
Sci Rep. 2019 Apr 11;9(1):5925. doi: 10.1038/s41598-019-42392-7.
5
New surveillance concepts in food safety in meat producing animals: the advantage of high throughput 'omics' technologies - A review.肉类生产动物食品安全的新监测概念:高通量“组学”技术的优势——综述
Asian-Australas J Anim Sci. 2018 Jul;31(7):1062-1071. doi: 10.5713/ajas.18.0155. Epub 2018 May 31.
6
An integrated segmentation and shape-based classification scheme for distinguishing adenocarcinomas from granulomas on lung CT.一种用于区分肺 CT 中腺癌和肉芽肿的集成分割和基于形状的分类方案。
Med Phys. 2017 Jul;44(7):3556-3569. doi: 10.1002/mp.12208. Epub 2017 May 23.
7
Dimensionality reduction-based fusion approaches for imaging and non-imaging biomedical data: concepts, workflow, and use-cases.用于成像和非成像生物医学数据的基于降维的融合方法:概念、工作流程和应用案例。
BMC Med Imaging. 2017 Jan 5;17(1):2. doi: 10.1186/s12880-016-0172-6.
8
Classification of Microarray Data Using Kernel Fuzzy Inference System.使用核模糊推理系统对微阵列数据进行分类
Int Sch Res Notices. 2014 Aug 21;2014:769159. doi: 10.1155/2014/769159. eCollection 2014.
9
Adaptive Dimensionality Reduction with Semi-Supervision (AdDReSS): Classifying Multi-Attribute Biomedical Data.具有半监督的自适应降维(AdDReSS):对多属性生物医学数据进行分类
PLoS One. 2016 Jul 15;11(7):e0159088. doi: 10.1371/journal.pone.0159088. eCollection 2016.
10
Emerging Themes in Image Informatics and Molecular Analysis for Digital Pathology.数字病理学的图像信息学与分子分析中的新兴主题
Annu Rev Biomed Eng. 2016 Jul 11;18:387-412. doi: 10.1146/annurev-bioeng-112415-114722.

本文引用的文献

1
A hierarchical unsupervised spectral clustering scheme for detection of prostate cancer from magnetic resonance spectroscopy (MRS).一种用于从磁共振波谱(MRS)中检测前列腺癌的分层无监督光谱聚类方案。
Med Image Comput Comput Assist Interv. 2007;10(Pt 2):278-86. doi: 10.1007/978-3-540-75759-7_34.
2
Gene selection via the BAHSIC family of algorithms.通过BAHSIC算法家族进行基因选择。
Bioinformatics. 2007 Jul 1;23(13):i490-8. doi: 10.1093/bioinformatics/btm216.
3
Molecular basis of the differences between normal and tumor tissues of gastric cancer.胃癌正常组织与肿瘤组织差异的分子基础
Biochim Biophys Acta. 2007 Sep;1772(9):1033-40. doi: 10.1016/j.bbadis.2007.05.005. Epub 2007 May 31.
4
Novel markers for differentiation of lobular and ductal invasive breast carcinomas by laser microdissection and microarray analysis.通过激光显微切割和微阵列分析鉴别小叶型和导管型浸润性乳腺癌的新型标志物
BMC Cancer. 2007 Mar 27;7:55. doi: 10.1186/1471-2407-7-55.
5
Importance of data structure in comparing two dimension reduction methods for classification of microarray gene expression data.数据结构在比较两种用于微阵列基因表达数据分类的降维方法中的重要性。
BMC Bioinformatics. 2007 Mar 13;8:90. doi: 10.1186/1471-2105-8-90.
6
Classifications of ovarian cancer tissues by proteomic patterns.通过蛋白质组学模式对卵巢癌组织进行分类。
Proteomics. 2006 Nov;6(21):5846-56. doi: 10.1002/pmic.200600165.
7
Using uncorrelated discriminant analysis for tissue classification with gene expression data.使用非相关判别分析结合基因表达数据进行组织分类。
IEEE/ACM Trans Comput Biol Bioinform. 2004 Oct-Dec;1(4):181-90. doi: 10.1109/TCBB.2004.45.
8
Selection of relevant genes in cancer diagnosis based on their prediction accuracy.基于相关基因的预测准确性进行癌症诊断中的基因选择。
Artif Intell Med. 2007 May;40(1):29-44. doi: 10.1016/j.artmed.2006.06.002. Epub 2006 Aug 22.
9
Local multidimensional scaling.局部多维缩放
Neural Netw. 2006 Jul-Aug;19(6-7):889-99. doi: 10.1016/j.neunet.2006.05.014. Epub 2006 Jun 19.
10
Graph embedding to improve supervised classification and novel class detection: application to prostate cancer.用于改进监督分类和新类别检测的图嵌入:在前列腺癌中的应用
Med Image Comput Comput Assist Interv. 2005;8(Pt 1):729-37. doi: 10.1007/11566465_90.