• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用交互式 R 笔记本(pguIMP)对生物分析实验室数据进行可视化引导的预处理。

Visually guided preprocessing of bioanalytical laboratory data using an interactive R notebook (pguIMP).

机构信息

Institute of Clinical Pharmacology, Goethe-University, Frankfurt am Main, Germany.

Fraunhofer Institute for Translational Medicine and Pharmacology ITMP, Frankfurt am Main, Germany.

出版信息

CPT Pharmacometrics Syst Pharmacol. 2021 Nov;10(11):1371-1381. doi: 10.1002/psp4.12704. Epub 2021 Oct 1.

DOI:10.1002/psp4.12704
PMID:34598320
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8592507/
Abstract

The evaluation of pharmacological data using machine learning requires high data quality. Therefore, data preprocessing, that is, cleaning analytical laboratory errors, replacing missing values or outliers, and transforming data adequately before actual data analysis, is crucial. Because current tools available for this purpose often require programming skills, preprocessing tools with graphical user interfaces that can be used interactively are needed. In collaboration between data scientists and experts in bioanalytical diagnostics, a graphical software package for data preprocessing called pguIMP is proposed, which contains a fixed sequence of preprocessing steps to enable reproducible interactive data preprocessing. As an R-based package, it also allows direct integration into this data science environment without requiring any programming knowledge. The implementation of contemporary data processing methods, including machine-learning-based imputation techniques, ensures the generation of corrected and cleaned bioanalytical data sets that preserve data structures such as clusters better than is possible with classical methods. This was evaluated on bioanalytical data sets from lipidomics and drug research using k-nearest-neighbors-based imputation followed by k-means clustering and density-based spatial clustering of applications with noise. The R package provides a Shiny-based web interface designed to be easy to use for non-data analysis experts. It is demonstrated that the spectrum of methods provided is suitable as a standard pipeline for preprocessing bioanalytical data in biomedical research domains. The R package pguIMP is freely available at the comprehensive R archive network (https://cran.r-project.org/web/packages/pguIMP/index.html).

摘要

使用机器学习评估药理学数据需要高质量的数据。因此,数据预处理(即在实际数据分析之前,清理分析实验室误差、替换缺失值或异常值,并适当转换数据)至关重要。由于目前为此目的提供的工具通常需要编程技能,因此需要具有图形用户界面的预处理工具,可以进行交互式使用。在数据科学家和生物分析诊断专家的合作下,提出了一个名为 pguIMP 的用于数据预处理的图形软件包,其中包含一系列固定的预处理步骤,以实现可重复的交互式数据预处理。作为一个基于 R 的软件包,它还允许直接集成到此数据科学环境中,而无需任何编程知识。实现现代数据处理方法,包括基于机器学习的插补技术,可确保生成经过校正和清理的生物分析数据集,这些数据集比经典方法更好地保留了数据结构,例如聚类。这是通过使用基于 k-最近邻的插补,然后进行 k-均值聚类和基于密度的空间聚类应用的噪声评估在脂质组学和药物研究的生物分析数据集上完成的。R 软件包提供了一个基于 Shiny 的 Web 界面,旨在为非数据分析专家提供易用性。结果表明,所提供的方法范围适合作为生物医学研究领域生物分析数据预处理的标准流水线。R 软件包 pguIMP 可在综合 R 档案网络上免费获得(https://cran.r-project.org/web/packages/pguIMP/index.html)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6345/8592507/690bf1c6ffd0/PSP4-10-1371-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6345/8592507/be9a9b417d76/PSP4-10-1371-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6345/8592507/b8458615faad/PSP4-10-1371-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6345/8592507/51f2366abf99/PSP4-10-1371-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6345/8592507/690bf1c6ffd0/PSP4-10-1371-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6345/8592507/be9a9b417d76/PSP4-10-1371-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6345/8592507/b8458615faad/PSP4-10-1371-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6345/8592507/51f2366abf99/PSP4-10-1371-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6345/8592507/690bf1c6ffd0/PSP4-10-1371-g001.jpg

相似文献

1
Visually guided preprocessing of bioanalytical laboratory data using an interactive R notebook (pguIMP).使用交互式 R 笔记本(pguIMP)对生物分析实验室数据进行可视化引导的预处理。
CPT Pharmacometrics Syst Pharmacol. 2021 Nov;10(11):1371-1381. doi: 10.1002/psp4.12704. Epub 2021 Oct 1.
2
NormalizeMets: assessing, selecting and implementing statistical methods for normalizing metabolomics data.NormalizeMets:评估、选择和实施代谢组学数据标准化的统计方法。
Metabolomics. 2018 Mar 20;14(5):54. doi: 10.1007/s11306-018-1347-7.
3
gcxgclab: An R Package for Two-Dimensional Gas Chromatography Preprocessing and Analysis.gcxgclab:一个用于二维气相色谱预处理和分析的R软件包。
Anal Chem. 2024 Oct 29;96(43):17059-17064. doi: 10.1021/acs.analchem.4c03126. Epub 2024 Oct 16.
4
BioInstaller: a comprehensive R package to construct interactive and reproducible biological data analysis applications based on the R platform.BioInstaller:一个基于R平台构建交互式和可重复生物学数据分析应用程序的综合R包。
PeerJ. 2018 Oct 31;6:e5853. doi: 10.7717/peerj.5853. eCollection 2018.
5
GOplot: an R package for visually combining expression data with functional analysis.GOplot:一个用于将表达数据与功能分析进行可视化整合的R软件包。
Bioinformatics. 2015 Sep 1;31(17):2912-4. doi: 10.1093/bioinformatics/btv300. Epub 2015 May 11.
6
BinMat: A molecular genetics tool for processing binary data obtained from fragment analysis in R.BinMat:一种用于处理在R中通过片段分析获得的二进制数据的分子遗传学工具。
Biodivers Data J. 2022 Mar 11;10:e77875. doi: 10.3897/BDJ.10.e77875. eCollection 2022.
7
TCIApathfinder: An R Client for the Cancer Imaging Archive REST API.TCIApathfinder:用于癌症成像档案 REST API 的 R 客户端。
Cancer Res. 2018 Aug 1;78(15):4424-4426. doi: 10.1158/0008-5472.CAN-18-0678. Epub 2018 Jun 5.
8
'rtry': An R package to support plant trait data preprocessing.“rtry”:一个支持植物性状数据预处理的R软件包。
Ecol Evol. 2024 May 8;14(5):e11292. doi: 10.1002/ece3.11292. eCollection 2024 May.
9
digeR: a graphical user interface R package for analyzing 2D-DIGE data.digeR:一个用于分析 2D-DIGE 数据的图形用户界面 R 包。
Bioinformatics. 2009 Nov 15;25(22):3033-4. doi: 10.1093/bioinformatics/btp514. Epub 2009 Aug 25.
10
Mercator: a pipeline for multi-method, unsupervised visualization and distance generation.墨卡托投影法:一种用于多方法、无监督可视化和距离生成的管道。
Bioinformatics. 2021 Sep 9;37(17):2780-2781. doi: 10.1093/bioinformatics/btab037.

引用本文的文献

1
Targeted lipidomics dataset of central nervous system and plasma from mice with experimental autoimmune encephalomyelitis.实验性自身免疫性脑脊髓炎小鼠中枢神经系统和血浆的靶向脂质组学数据集
Data Brief. 2025 Aug 5;62:111948. doi: 10.1016/j.dib.2025.111948. eCollection 2025 Oct.
2
Cyclical hybrid imputation technique for missing values in data sets.用于数据集中缺失值的循环混合插补技术。
Sci Rep. 2025 Feb 24;15(1):6543. doi: 10.1038/s41598-025-90964-7.
3
Machine learning and biological validation identify sphingolipids as potential mediators of paclitaxel-induced neuropathy in cancer patients.

本文引用的文献

1
A Benchmark for Data Imputation Methods.数据插补方法的一个基准。
Front Big Data. 2021 Jul 8;4:693674. doi: 10.3389/fdata.2021.693674. eCollection 2021.
2
Approaches to handling missing or "problematic" pharmacology data: Pharmacokinetics.处理缺失或“有问题”的药理学数据的方法:药代动力学。
CPT Pharmacometrics Syst Pharmacol. 2021 Apr;10(4):291-308. doi: 10.1002/psp4.12611.
3
Ten simple rules to power drug discovery with data science.利用数据科学推动药物发现的十条简单规则。
机器学习和生物学验证将鞘脂类确定为癌症患者紫杉醇诱导神经病变的潜在介质。
Elife. 2024 Sep 30;13:RP91941. doi: 10.7554/eLife.91941.
4
Artificial intelligence and machine learning in pain research: a data scientometric analysis.疼痛研究中的人工智能与机器学习:一项数据科学计量分析。
Pain Rep. 2022 Nov 3;7(6):e1044. doi: 10.1097/PR9.0000000000001044. eCollection 2022 Nov-Dec.
PLoS Comput Biol. 2020 Aug 27;16(8):e1008126. doi: 10.1371/journal.pcbi.1008126. eCollection 2020 Aug.
4
Discovery and validation of biomarkers to aid the development of safe and effective pain therapeutics: challenges and opportunities.发现和验证生物标志物以辅助安全有效的疼痛治疗药物的开发:挑战与机遇。
Nat Rev Neurol. 2020 Jul;16(7):381-400. doi: 10.1038/s41582-020-0362-2. Epub 2020 Jun 15.
5
An Introduction to Machine Learning.机器学习简介。
Clin Pharmacol Ther. 2020 Apr;107(4):871-885. doi: 10.1002/cpt.1796. Epub 2020 Mar 3.
6
A Data Science-Based Analysis Points at Distinct Patterns of Lipid Mediator Plasma Concentrations in Patients With Dementia.基于数据科学的分析揭示了痴呆症患者血浆脂质介质浓度的独特模式。
Front Psychiatry. 2019 Feb 11;10:41. doi: 10.3389/fpsyt.2019.00041. eCollection 2019.
7
Data visualizations to detect systematic errors in laboratory assay results.数据可视化以检测实验室检测结果中的系统误差。
Pharmacol Res Perspect. 2017 Dec;5(6). doi: 10.1002/prp2.369.
8
Implementation of Good Laboratory Practices (GLP) in basic scientific research: Translating the concept beyond regulatory compliance.基础科学研究中良好实验室规范(GLP)的实施:将这一概念的应用拓展至法规合规之外。
Regul Toxicol Pharmacol. 2017 Oct;89:20-25. doi: 10.1016/j.yrtph.2017.07.010. Epub 2017 Jul 13.
9
The Use of Multiple Imputation for Data Subject to Limits of Detection.对受检测限影响的数据使用多重填补法。
Sri Lankan J Appl Stat. 2014;5(4):227-246. doi: 10.4038/sljastats.v5i4.7792. Epub 2014 Dec 15.
10
Bioanalytical techniques in nontargeted clinical lipidomics.非靶向临床脂质组学中的生物分析技术
Bioanalysis. 2016 Feb;8(4):351-64. doi: 10.4155/bio.15.244. Epub 2016 Feb 9.