PerSEveML：一种基于网络的工具，使用集成机器学习方法识别罕见事件的持久性生物标志物结构。

PerSEveML: A Web-Based Tool to Identify Persistent Biomarker Structure for Rare Events Using Integrative Machine Learning Approach.

作者信息

Dutta Sreejata, Mudaranthakam Dinesh Pal, Li Yanming, Sardiu Mihaela E

机构信息

Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA.

University of Kansas Cancer Center, Kansas City, USA.

出版信息

bioRxiv. 2023 Oct 30:2023.10.25.564000. doi: 10.1101/2023.10.25.564000.

DOI:10.1101/2023.10.25.564000

PMID:38196661

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10775315/

Abstract

Omics datasets often pose a computational challenge due to their high dimensionality, large size, and non-linear structures. Analyzing these datasets becomes especially daunting in the presence of rare events. Machine learning (ML) methods have gained traction for analyzing rare events, yet there remains a limited exploration of bioinformatics tools that integrate ML techniques to comprehend the underlying biology. Expanding upon our previously developed computational framework of an integrative machine learning approach, we introduce PerSEveML, an interactive web-based that uses crowd-sourced intelligence to predict rare events and determine feature selection structures. PerSEveML provides a comprehensive overview of the integrative approach through evaluation metrics that help users understand the contribution of individual ML methods to the prediction process. Additionally, PerSEveML calculates entropy and rank scores, which visually organize input features into a persistent structure of selected, unselected, and fluctuating categories that help researchers uncover meaningful hypotheses regarding the underlying biology. We have evaluated PerSEveML on three diverse biologically complex data sets with extremely rare events from small to large scale and have demonstrated its ability to generate valid hypotheses. PerSEveML is available at https://biostats-shinyr.kumc.edu/PerSEveML/ and https://github.com/sreejatadutta/PerSEveML.

摘要

由于其高维度、大尺寸和非线性结构，组学数据集常常带来计算挑战。在存在罕见事件的情况下，分析这些数据集变得尤其艰巨。机器学习（ML）方法在分析罕见事件方面已受到关注，但将ML技术整合以理解潜在生物学的生物信息学工具仍探索有限。在我们之前开发的综合机器学习方法计算框架的基础上，我们引入了PerSEveML，这是一个基于网络的交互式工具，它利用众包智能来预测罕见事件并确定特征选择结构。PerSEveML通过评估指标提供了综合方法的全面概述，帮助用户了解各个ML方法对预测过程的贡献。此外，PerSEveML计算熵和排名分数，将输入特征直观地组织成一个由选定、未选定和波动类别组成的持久结构，这有助于研究人员揭示有关潜在生物学的有意义假设。我们已在三个具有从小规模到大规模极其罕见事件的不同生物复杂数据集上对PerSEveML进行了评估，并证明了其生成有效假设的能力。PerSEveML可在https://biostats-shinyr.kumc.edu/PerSEveML/和https://github.com/sreejatadutta/PerSEveML上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2193/10775315/2734a6a42973/nihpp-2023.10.25.564000v1-f0001.jpg

相似文献

PerSEveML: A Web-Based Tool to Identify Persistent Biomarker Structure for Rare Events Using Integrative Machine Learning Approach.

bioRxiv. 2023 Oct 30:2023.10.25.564000. doi: 10.1101/2023.10.25.564000.

PerSEveML: a web-based tool to identify persistent biomarker structure for rare events using an integrative machine learning approach.

Mol Omics. 2024 Jun 10;20(5):348-358. doi: 10.1039/d4mo00008k.

Identifying dynamical persistent biomarker structures for rare events using modern integrative machine learning approach.

Proteomics. 2023 Nov;23(21-22):e2200290. doi: 10.1002/pmic.202200290. Epub 2023 Mar 10.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification

Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.

Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.

Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.

J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.

pwrEWAS: a user-friendly tool for comprehensive power estimation for epigenome wide association studies (EWAS).

BMC Bioinformatics. 2019 Apr 29;20(1):218. doi: 10.1186/s12859-019-2804-7.

Fast and interpretable genomic data analysis using multiple approximate kernel learning.

Bioinformatics. 2022 Jun 24;38(Suppl 1):i77-i83. doi: 10.1093/bioinformatics/btac241.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Invention of 3Mint for feature grouping and scoring in multi-omics.

Front Genet. 2023 Mar 15;14:1093326. doi: 10.3389/fgene.2023.1093326. eCollection 2023.

本文引用的文献

Deep learning applications in single-cell genomics and transcriptomics data analysis.

Biomed Pharmacother. 2023 Sep;165:115077. doi: 10.1016/j.biopha.2023.115077. Epub 2023 Jul 1.

Machine learning in rare disease.

Nat Methods. 2023 Jun;20(6):803-814. doi: 10.1038/s41592-023-01886-z. Epub 2023 May 29.

Identification of Protein Complexes by Integrating Protein Abundance and Interaction Features Using a Deep Learning Strategy.

Int J Mol Sci. 2023 Apr 26;24(9):7884. doi: 10.3390/ijms24097884.

Enrichr-KG: bridging enrichment analysis across multiple libraries.

Nucleic Acids Res. 2023 Jul 5;51(W1):W168-W179. doi: 10.1093/nar/gkad393.

Identifying dynamical persistent biomarker structures for rare events using modern integrative machine learning approach.

Proteomics. 2023 Nov;23(21-22):e2200290. doi: 10.1002/pmic.202200290. Epub 2023 Mar 10.

CD11b mediates hypertensive cardiac remodeling by regulating macrophage infiltration and polarization.

J Adv Res. 2024 Jan;55:17-31. doi: 10.1016/j.jare.2023.02.010. Epub 2023 Feb 21.

Application of Machine Learning for Cytometry Data.

Front Immunol. 2022 Jan 3;12:787574. doi: 10.3389/fimmu.2021.787574. eCollection 2021.

DOME: recommendations for supervised machine learning validation in biology.

Nat Methods. 2021 Oct;18(10):1122-1127. doi: 10.1038/s41592-021-01205-4.

CD38-directed CAR-T cell therapy: a novel immunotherapy strategy for relapsed acute myeloid leukemia after allogeneic hematopoietic stem cell transplantation.

J Hematol Oncol. 2021 May 25;14(1):82. doi: 10.1186/s13045-021-01092-4.

multiSLIDE is a web server for exploring connected elements of biological pathways in multi-omics data.

Nat Commun. 2021 Apr 16;12(1):2279. doi: 10.1038/s41467-021-22650-x.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

PerSEveML：一种基于网络的工具，使用集成机器学习方法识别罕见事件的持久性生物标志物结构。

PerSEveML: A Web-Based Tool to Identify Persistent Biomarker Structure for Rare Events Using Integrative Machine Learning Approach.

作者信息

Dutta Sreejata, Mudaranthakam Dinesh Pal, Li Yanming, Sardiu Mihaela E

机构信息

Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA.

University of Kansas Cancer Center, Kansas City, USA.

出版信息

bioRxiv. 2023 Oct 30:2023.10.25.564000. doi: 10.1101/2023.10.25.564000.

DOI:10.1101/2023.10.25.564000

PMID:38196661

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10775315/

Abstract

摘要

PerSEveML：一种基于网络的工具，使用集成机器学习方法识别罕见事件的持久性生物标志物结构。

PerSEveML: A Web-Based Tool to Identify Persistent Biomarker Structure for Rare Events Using Integrative Machine Learning Approach.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

PerSEveML：一种基于网络的工具，使用集成机器学习方法识别罕见事件的持久性生物标志物结构。

PerSEveML: A Web-Based Tool to Identify Persistent Biomarker Structure for Rare Events Using Integrative Machine Learning Approach.

作者信息

机构信息

出版信息

相似文献

本文引用的文献