通过最小曲率展开，非线性维数降低和聚类揭示出神经性疼痛和组织胚胎类。

Nonlinear dimension reduction and clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes.

机构信息

Computational Bioscience Research Center, Division of Chemical and Life Sciences and Engineering, King Abdullah University for Science and Technology (KAUST), Jeddah, Kingdom of Saudi Arabia.

出版信息

Bioinformatics. 2010 Sep 15;26(18):i531-9. doi: 10.1093/bioinformatics/btq376.

DOI:10.1093/bioinformatics/btq376

PMID:20823318

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2935424/

Abstract

MOTIVATION

Nonlinear small datasets, which are characterized by low numbers of samples and very high numbers of measures, occur frequently in computational biology, and pose problems in their investigation. Unsupervised hybrid-two-phase (H2P) procedures-specifically dimension reduction (DR), coupled with clustering-provide valuable assistance, not only for unsupervised data classification, but also for visualization of the patterns hidden in high-dimensional feature space.

METHODS

'Minimum Curvilinearity' (MC) is a principle that-for small datasets-suggests the approximation of curvilinear sample distances in the feature space by pair-wise distances over their minimum spanning tree (MST), and thus avoids the introduction of any tuning parameter. MC is used to design two novel forms of nonlinear machine learning (NML): Minimum Curvilinear embedding (MCE) for DR, and Minimum Curvilinear affinity propagation (MCAP) for clustering.

RESULTS

Compared with several other unsupervised and supervised algorithms, MCE and MCAP, whether individually or combined in H2P, overcome the limits of classical approaches. High performance was attained in the visualization and classification of: (i) pain patients (proteomic measurements) in peripheral neuropathy; (ii) human organ tissues (genomic transcription factor measurements) on the basis of their embryological origin.

CONCLUSION

MC provides a valuable framework to estimate nonlinear distances in small datasets. Its extension to large datasets is prefigured for novel NMLs. Classification of neuropathic pain by proteomic profiles offers new insights for future molecular and systems biology characterization of pain. Improvements in tissue embryological classification refine results obtained in an earlier study, and suggest a possible reinterpretation of skin attribution as mesodermal.

AVAILABILITY

https://sites.google.com/site/carlovittoriocannistraci/home.

摘要

动机

计算生物学中经常出现非线性小数据集，其特点是样本数量少，测量数量非常多。这些数据集在研究中存在问题。无监督混合两阶段（H2P）程序，特别是降维和聚类，为无监督数据分类以及高维特征空间中隐藏模式的可视化提供了有价值的帮助。

方法

“最小曲率”（MC）原则对于小数据集，建议通过其最小生成树（MST）上的成对距离来近似特征空间中的曲线样本距离，从而避免引入任何调整参数。MC 用于设计两种新形式的非线性机器学习（NML）：最小曲率嵌入（MCE）用于降维，最小曲率亲和传播（MCAP）用于聚类。

结果

与其他几种无监督和监督算法相比，MCE 和 MCAP 无论是单独使用还是组合使用 H2P，都克服了经典方法的局限性。在可视化和分类方面取得了优异的性能：（i）周围神经病变患者的疼痛（蛋白质组学测量）；（ii）基于胚胎起源的人类器官组织（基因组转录因子测量）。

结论

MC 为估计小数据集中的非线性距离提供了一个有价值的框架。其扩展到大数据集为新的 NML 提供了依据。通过蛋白质组学谱对神经性疼痛进行分类，为疼痛的未来分子和系统生物学特征提供了新的见解。组织胚胎学分类的改进完善了早期研究的结果，并暗示了皮肤归因于中胚层的可能重新解释。

可用性

https://sites.google.com/site/carlovittoriocannistraci/home。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bbcf/2935424/91f8016db23d/btq376f1.jpg

相似文献

Nonlinear dimension reduction and clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes.

Bioinformatics. 2010 Sep 15;26(18):i531-9. doi: 10.1093/bioinformatics/btq376.

Nonlinear Dimensionality Reduction by Minimum Curvilinearity for Unsupervised Discovery of Patterns in Multidimensional Proteomic Data.

Methods Mol Biol. 2016;1384:289-98. doi: 10.1007/978-1-4939-3255-9_16.

Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding.

Bioinformatics. 2013 Jul 1;29(13):i199-209. doi: 10.1093/bioinformatics/btt208.

Spectral embedding finds meaningful (relevant) structure in image and microarray data.

BMC Bioinformatics. 2006 Feb 16;7:74. doi: 10.1186/1471-2105-7-74.

Exploring nonlinear feature space dimension reduction and data representation in breast Cadx with Laplacian eigenmaps and t-SNE.

Med Phys. 2010 Jan;37(1):339-51. doi: 10.1118/1.3267037.

DiviK: divisive intelligent K-means for hands-free unsupervised clustering in big biological data.

BMC Bioinformatics. 2022 Dec 12;23(1):538. doi: 10.1186/s12859-022-05093-z.

Exploring high dimensional data with Butterfly: a novel classification algorithm based on discrete dynamical systems.

Bioinformatics. 2014 Mar 1;30(5):712-8. doi: 10.1093/bioinformatics/btt602. Epub 2013 Oct 21.

Does Determination of Initial Cluster Centroids Improve the Performance of -Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm, Minimum Spanning Tree, and Hierarchical Clustering in an Applied Study.

Comput Math Methods Med. 2020 Aug 1;2020:7636857. doi: 10.1155/2020/7636857. eCollection 2020.

Unsupervised active learning based on hierarchical graph-theoretic clustering.

IEEE Trans Syst Man Cybern B Cybern. 2009 Oct;39(5):1147-61. doi: 10.1109/TSMCB.2009.2013197. Epub 2009 Mar 24.

Advances in clustering and visualization of time series using GTM through time.

Neural Netw. 2008 Sep;21(7):904-13. doi: 10.1016/j.neunet.2008.05.013. Epub 2008 Jun 14.

引用本文的文献

Spatial Reconstruction of Oligo and Single Cells by De Novo Coalescent Embedding of Transcriptomic Networks.

Adv Sci (Weinh). 2023 Jul;10(20):e2206307. doi: 10.1002/advs.202206307. Epub 2023 Jun 15.

Hereditary Hyperferritinemia.

Int J Mol Sci. 2023 Jan 29;24(3):2560. doi: 10.3390/ijms24032560.

Mathematical and Computational Models for Pain: A Systematic Review.

Pain Med. 2021 Dec 11;22(12):2806-2817. doi: 10.1093/pm/pnab177.

Optimisation of the coalescent hyperbolic embedding of complex networks.

Sci Rep. 2021 Apr 16;11(1):8350. doi: 10.1038/s41598-021-87333-5.

Nonlinear machine learning pattern recognition and bacteria-metabolite multilayer network analysis of perturbed gastric microbiome.

Nat Commun. 2021 Mar 26;12(1):1926. doi: 10.1038/s41467-021-22135-x.

Framework for improving outcome prediction for acute to chronic low back pain transitions.

Pain Rep. 2020 Mar 4;5(2):e809. doi: 10.1097/PR9.0000000000000809. eCollection 2020 Mar-Apr.

Prediction of comorbid diseases using weighted geometric embedding of human interactome.

BMC Med Genomics. 2019 Dec 30;12(Suppl 7):161. doi: 10.1186/s12920-019-0605-5.

CSF levels of apolipoprotein C1 and autotaxin found to associate with neuropathic pain and fibromyalgia.

J Pain Res. 2019 Oct 15;12:2875-2889. doi: 10.2147/JPR.S215348. eCollection 2019.

Comprehensive and quantitative analysis of white and brown adipose tissue by shotgun lipidomics.

Mol Metab. 2019 Apr;22:12-20. doi: 10.1016/j.molmet.2019.01.009. Epub 2019 Jan 30.

Improved prediction of missing protein interactome links via anomaly detection.

Appl Netw Sci. 2017;2(1):2. doi: 10.1007/s41109-017-0022-7. Epub 2017 Jan 28.

本文引用的文献

Differential expression of ceruloplasmin isoforms in the cerebrospinal fluid of amyotrophic lateral sclerosis patients.

Proteomics Clin Appl. 2008 Dec;2(12):1628-37. doi: 10.1002/prca.200780081. Epub 2008 Oct 7.

An atlas of combinatorial transcriptional regulation in mouse and man.

Cell. 2010 Mar 5;140(5):744-52. doi: 10.1016/j.cell.2010.01.044.

Pitfalls of supervised feature selection.

Bioinformatics. 2010 Feb 1;26(3):440-3. doi: 10.1093/bioinformatics/btp621. Epub 2009 Oct 29.

Median-modified Wiener filter provides efficient denoising, preserving spot edge and morphology in 2-DE image processing.

Proteomics. 2009 Nov;9(21):4908-19. doi: 10.1002/pmic.200800538.

An integrated strategy in two-dimensional electrophoresis analysis able to identify discriminants between different clinical conditions.

Exp Biol Med (Maywood). 2008 Apr;233(4):483-91. doi: 10.3181/0707-RM-187.

Clustering by soft-constraint affinity propagation: applications to gene-expression data.

Bioinformatics. 2007 Oct 15;23(20):2708-15. doi: 10.1093/bioinformatics/btm414. Epub 2007 Sep 25.

Clustering by passing messages between data points.

Science. 2007 Feb 16;315(5814):972-6. doi: 10.1126/science.1136800. Epub 2007 Jan 11.

Mechanisms of disease: mechanism-based classification of neuropathic pain-a critical analysis.

Nat Clin Pract Neurol. 2006 Feb;2(2):107-15. doi: 10.1038/ncpneuro0118.

Mechanisms of disease: neuropathic pain--a clinical perspective.

Nat Clin Pract Neurol. 2006 Feb;2(2):95-106. doi: 10.1038/ncpneuro0113.

Classification of microarray data with factor mixture models.

Bioinformatics. 2006 Jan 15;22(2):202-8. doi: 10.1093/bioinformatics/bti779. Epub 2005 Nov 15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过最小曲率展开，非线性维数降低和聚类揭示出神经性疼痛和组织胚胎类。

Nonlinear dimension reduction and clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes.

机构信息

Computational Bioscience Research Center, Division of Chemical and Life Sciences and Engineering, King Abdullah University for Science and Technology (KAUST), Jeddah, Kingdom of Saudi Arabia.