• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MVP - 一款用于清理质谱数据中重复记录和缺失值的开源预处理器。

mvp - an open-source preprocessor for cleaning duplicate records and missing values in mass spectrometry data.

作者信息

Lee Geunho, Lee Hyun Beom, Jung Byung Hwa, Nam Hojung

机构信息

School of Electrical Engineering and Computer Science Gwangju Institute of Science and Technology (GIST) Korea.

Molecular Recognition Research Center Korea Institute of Science and Technology (KIST) Seoul Korea.

出版信息

FEBS Open Bio. 2017 Jun 19;7(7):1051-1059. doi: 10.1002/2211-5463.12247. eCollection 2017 Jul.

DOI:10.1002/2211-5463.12247
PMID:28680817
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5494294/
Abstract

Mass spectrometry (MS) data are used to analyze biological phenomena based on chemical species. However, these data often contain unexpected duplicate records and missing values due to technical or biological factors. These 'dirty data' problems increase the difficulty of performing MS analyses because they lead to performance degradation when statistical or machine-learning tests are applied to the data. Thus, we have developed missing values preprocessor (mvp), an open-source software for preprocessing data that might include duplicate records and missing values. mvp uses the property of MS data in which identical chemical species present the same or similar values for key identifiers, such as the mass-to-charge ratio and intensity signal, and forms cliques via graph theory to process dirty data. We evaluated the validity of the mvp process via quantitative and qualitative analyses and compared the results from a statistical test that analyzed the original and mvp-applied data. This analysis showed that using mvp reduces problems associated with duplicate records and missing values. We also examined the effects of using unprocessed data in statistical tests and examined the improved statistical test results obtained with data preprocessed using mvp.

摘要

质谱(MS)数据用于基于化学物质分析生物现象。然而,由于技术或生物因素,这些数据常常包含意外的重复记录和缺失值。这些“脏数据”问题增加了进行质谱分析的难度,因为当对数据应用统计或机器学习测试时,它们会导致性能下降。因此,我们开发了缺失值预处理器(mvp),这是一款用于预处理可能包含重复记录和缺失值的数据的开源软件。mvp利用质谱数据的特性,即相同的化学物质对于关键标识符(如质荷比和强度信号)呈现相同或相似的值,并通过图论形成团来处理脏数据。我们通过定量和定性分析评估了mvp处理过程的有效性,并比较了对原始数据和应用mvp后的数据进行统计测试的结果。该分析表明,使用mvp可减少与重复记录和缺失值相关的问题。我们还研究了在统计测试中使用未处理数据的影响,并研究了使用mvp预处理后的数据所获得的改进的统计测试结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b278/5494294/b29170ba18f7/FEB4-7-1051-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b278/5494294/a23aec83d3a1/FEB4-7-1051-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b278/5494294/020edec9a206/FEB4-7-1051-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b278/5494294/be1a57cb51b6/FEB4-7-1051-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b278/5494294/b29170ba18f7/FEB4-7-1051-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b278/5494294/a23aec83d3a1/FEB4-7-1051-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b278/5494294/020edec9a206/FEB4-7-1051-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b278/5494294/be1a57cb51b6/FEB4-7-1051-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b278/5494294/b29170ba18f7/FEB4-7-1051-g004.jpg

相似文献

1
mvp - an open-source preprocessor for cleaning duplicate records and missing values in mass spectrometry data.MVP - 一款用于清理质谱数据中重复记录和缺失值的开源预处理器。
FEBS Open Bio. 2017 Jun 19;7(7):1051-1059. doi: 10.1002/2211-5463.12247. eCollection 2017 Jul.
2
Counting missing values in a metabolite-intensity data set for measuring the analytical performance of a metabolomics platform.计算代谢物强度数据集中的缺失值,以衡量代谢组学平台的分析性能。
Anal Chem. 2015 Jan 20;87(2):1306-13. doi: 10.1021/ac5039994. Epub 2014 Dec 30.
3
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
4
Assessment and improvement of statistical tools for comparative proteomics analysis of sparse data sets with few experimental replicates.评估和改进具有少量实验重复的稀疏数据集比较蛋白质组学分析的统计工具。
J Proteome Res. 2013 Sep 6;12(9):3874-83. doi: 10.1021/pr400045u. Epub 2013 Aug 5.
5
Comparison of software tools to improve the detection of carcinogen induced changes in the rat liver proteome by analyzing SELDI-TOF-MS spectra.通过分析表面增强激光解吸电离飞行时间质谱(SELDI-TOF-MS)光谱比较软件工具以改善致癌物诱导的大鼠肝脏蛋白质组变化的检测。
J Proteome Res. 2006 Feb;5(2):254-61. doi: 10.1021/pr050279o.
6
Randomized pilot study of a new atrial-based minimal ventricular pacing mode in dual-chamber implantable cardioverter-defibrillators.
Heart Rhythm. 2004 Jul;1(2):160-7. doi: 10.1016/j.hrthm.2004.03.059.
7
Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.并行缺失值插补:一种用于微阵列数据的新型稳健缺失值估计算法。
Bioinformatics. 2005 May 15;21(10):2417-23. doi: 10.1093/bioinformatics/bti345. Epub 2005 Feb 24.
8
Comparative evaluation of preprocessing freeware on chromatography/mass spectrometry data for signature discovery.预处理免费软件在色谱/质谱数据中用于特征发现的比较评估。
J Chromatogr A. 2014 Sep 5;1358:155-64. doi: 10.1016/j.chroma.2014.06.100. Epub 2014 Jul 7.
9
A preprocessing method for improving data mining techniques. Application to a large medical diabetes database.一种用于改进数据挖掘技术的预处理方法。应用于大型糖尿病医学数据库。
Stud Health Technol Inform. 2003;95:269-74.
10
Algorithms and tools for analysis and management of mass spectrometry data.用于质谱数据分析和管理的算法与工具
Brief Bioinform. 2008 Mar;9(2):144-55. doi: 10.1093/bib/bbn007. Epub 2008 Mar 20.

本文引用的文献

1
Time-of-flight mass spectrometry: Introduction to the basics.飞行时间质谱法:基础介绍。
Mass Spectrom Rev. 2017 Jan;36(1):86-109. doi: 10.1002/mas.21520. Epub 2016 Nov 9.
2
A novel hybrid classification model of genetic algorithms, modified k-Nearest Neighbor and developed backpropagation neural network.一种由遗传算法、改进的k近邻算法和改进的反向传播神经网络组成的新型混合分类模型。
PLoS One. 2014 Nov 24;9(11):e112987. doi: 10.1371/journal.pone.0112987. eCollection 2014.
3
A classifier ensemble approach for the missing feature problem.
分类器集成方法解决缺失特征问题。
Artif Intell Med. 2012 May;55(1):37-50. doi: 10.1016/j.artmed.2011.11.006. Epub 2011 Dec 20.
4
Multiple imputation in a large-scale complex survey: a practical guide.大规模复杂调查中的多重插补:实用指南。
Stat Methods Med Res. 2010 Dec;19(6):653-70. doi: 10.1177/0962280208101273. Epub 2009 Aug 4.
5
Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls.流行病学和临床研究中缺失数据的多重填补:潜力与陷阱
BMJ. 2009 Jun 29;338:b2393. doi: 10.1136/bmj.b2393.
6
Orbitrap mass spectrometry: instrumentation, ion motion and applications.轨道阱质谱:仪器、离子运动及应用
Mass Spectrom Rev. 2008 Nov-Dec;27(6):661-99. doi: 10.1002/mas.20186.
7
Multiple imputation for national public-use datasets and its possible application for gestational age in United States Natality files.美国出生记录文件中全国性公共使用数据集的多重填补及其在胎龄方面的可能应用。
Paediatr Perinat Epidemiol. 2007 Sep;21 Suppl 2:97-105. doi: 10.1111/j.1365-3016.2007.00866.x.
8
Review: a gentle introduction to imputation of missing values.综述:缺失值插补的简要介绍
J Clin Epidemiol. 2006 Oct;59(10):1087-91. doi: 10.1016/j.jclinepi.2006.01.014. Epub 2006 Jul 11.
9
Mass spectrometry of peptides and proteins.肽和蛋白质的质谱分析。
Methods. 2005 Mar;35(3):211-22. doi: 10.1016/j.ymeth.2004.08.013. Epub 2005 Jan 20.
10
A Bayesian missing value estimation method for gene expression profile data.一种用于基因表达谱数据的贝叶斯缺失值估计方法。
Bioinformatics. 2003 Nov 1;19(16):2088-96. doi: 10.1093/bioinformatics/btg287.