• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于变异性位置约束的聚类:将降阶 M 估计应用于无标签 LC-MS 数据分析。

Clustering with position-specific constraints on variance: applying redescending M-estimators to label-free LC-MS data analysis.

机构信息

Institute of High Energy Physics, Austrian Academy of Sciences, Vienna, Austria.

出版信息

BMC Bioinformatics. 2011 Aug 31;12:358. doi: 10.1186/1471-2105-12-358.

DOI:10.1186/1471-2105-12-358
PMID:21884583
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3178548/
Abstract

BACKGROUND

Clustering is a widely applicable pattern recognition method for discovering groups of similar observations in data. While there are a large variety of clustering algorithms, very few of these can enforce constraints on the variation of attributes for data points included in a given cluster. In particular, a clustering algorithm that can limit variation within a cluster according to that cluster's position (centroid location) can produce effective and optimal results in many important applications ranging from clustering of silicon pixels or calorimeter cells in high-energy physics to label-free liquid chromatography based mass spectrometry (LC-MS) data analysis in proteomics and metabolomics.

RESULTS

We present MEDEA (M-Estimator with DEterministic Annealing), an M-estimator based, new unsupervised algorithm that is designed to enforce position-specific constraints on variance during the clustering process. The utility of MEDEA is demonstrated by applying it to the problem of "peak matching"--identifying the common LC-MS peaks across multiple samples--in proteomic biomarker discovery. Using real-life datasets, we show that MEDEA not only outperforms current state-of-the-art model-based clustering methods, but also results in an implementation that is significantly more efficient, and hence applicable to much larger LC-MS data sets.

CONCLUSIONS

MEDEA is an effective and efficient solution to the problem of peak matching in label-free LC-MS data. The program implementing the MEDEA algorithm, including datasets, clustering results, and supplementary information is available from the author website at http://www.hephy.at/user/fru/medea/.

摘要

背景

聚类是一种广泛应用的模式识别方法,用于发现数据中相似观测值的群组。虽然有各种各样的聚类算法,但很少有算法能够对给定聚类中数据点的属性变化施加约束。特别是,能够根据聚类的位置(质心位置)限制聚类内变化的聚类算法,可以在许多重要的应用中产生有效和最佳的结果,这些应用范围从高能物理学中的硅像素或量热计单元聚类到蛋白质组学和代谢组学中的无标记液相色谱-质谱(LC-MS)数据分析。

结果

我们提出了 MEDEA(基于 M-估计的确定性退火),这是一种基于 M-估计的新无监督算法,旨在在聚类过程中对方差施加位置特定的约束。通过将 MEDEA 应用于蛋白质组学生物标志物发现中的“峰匹配”问题(识别多个样本中的共同 LC-MS 峰),证明了 MEDEA 的实用性。使用真实数据集,我们表明 MEDEA 不仅优于当前最先进的基于模型的聚类方法,而且还实现了一种效率更高的方法,因此适用于更大的 LC-MS 数据集。

结论

MEDEA 是无标记 LC-MS 数据中峰匹配问题的有效且高效的解决方案。执行 MEDEA 算法的程序,包括数据集、聚类结果和补充信息,可从作者网站 http://www.hephy.at/user/fru/medea/ 获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c5c/3178548/9afbd5ea4d6d/1471-2105-12-358-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c5c/3178548/175adace1be7/1471-2105-12-358-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c5c/3178548/476b0d0b50f3/1471-2105-12-358-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c5c/3178548/16c2f80d9f4f/1471-2105-12-358-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c5c/3178548/bed61653ff4c/1471-2105-12-358-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c5c/3178548/e0ab30b844e9/1471-2105-12-358-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c5c/3178548/2bc12c923642/1471-2105-12-358-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c5c/3178548/68061b9c4111/1471-2105-12-358-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c5c/3178548/ac48981de473/1471-2105-12-358-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c5c/3178548/9afbd5ea4d6d/1471-2105-12-358-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c5c/3178548/175adace1be7/1471-2105-12-358-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c5c/3178548/476b0d0b50f3/1471-2105-12-358-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c5c/3178548/16c2f80d9f4f/1471-2105-12-358-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c5c/3178548/bed61653ff4c/1471-2105-12-358-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c5c/3178548/e0ab30b844e9/1471-2105-12-358-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c5c/3178548/2bc12c923642/1471-2105-12-358-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c5c/3178548/68061b9c4111/1471-2105-12-358-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c5c/3178548/ac48981de473/1471-2105-12-358-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c5c/3178548/9afbd5ea4d6d/1471-2105-12-358-9.jpg

相似文献

1
Clustering with position-specific constraints on variance: applying redescending M-estimators to label-free LC-MS data analysis.基于变异性位置约束的聚类:将降阶 M 估计应用于无标签 LC-MS 数据分析。
BMC Bioinformatics. 2011 Aug 31;12:358. doi: 10.1186/1471-2105-12-358.
2
MultiAlign: a multiple LC-MS analysis tool for targeted omics analysis.MultiAlign:一种用于靶向组学分析的多重 LC-MS 分析工具。
BMC Bioinformatics. 2013 Feb 12;14:49. doi: 10.1186/1471-2105-14-49.
3
A simple peak detection and label-free quantitation algorithm for chromatography-mass spectrometry.一种用于色谱-质谱联用的简单峰检测和无标记定量算法。
BMC Bioinformatics. 2014 Nov 25;15(1):376. doi: 10.1186/s12859-014-0376-0.
4
CLUE-TIPS, clustering methods for pattern analysis of LC-MS data.CLUE-TIPS,一种用于 LC-MS 数据分析的模式分析聚类方法。
J Proteome Res. 2009 Oct;8(10):4732-42. doi: 10.1021/pr900427q.
5
Incorporating peak grouping information for alignment of multiple liquid chromatography-mass spectrometry datasets.整合峰分组信息以对齐多个液相色谱-质谱数据集。
Bioinformatics. 2015 Jun 15;31(12):1999-2006. doi: 10.1093/bioinformatics/btv072. Epub 2015 Feb 2.
6
Corra: Computational framework and tools for LC-MS discovery and targeted mass spectrometry-based proteomics.科拉:用于液相色谱-质谱联用发现和基于靶向质谱的蛋白质组学的计算框架及工具。
BMC Bioinformatics. 2008 Dec 16;9:542. doi: 10.1186/1471-2105-9-542.
7
Multi-profile Bayesian alignment model for LC-MS data analysis with integration of internal standards.多谱图贝叶斯对齐模型,用于结合内标进行 LC-MS 数据分析。
Bioinformatics. 2013 Nov 1;29(21):2774-80. doi: 10.1093/bioinformatics/btt461. Epub 2013 Sep 6.
8
MS-BID: a Java package for label-free LC-MS-based comparative proteomic analysis.MS-BID:一个用于基于无标记液相色谱-质谱联用的比较蛋白质组学分析的Java软件包。
Bioinformatics. 2008 Nov 15;24(22):2641-2. doi: 10.1093/bioinformatics/btn491. Epub 2008 Sep 19.
9
Graph-based peak alignment algorithms for multiple liquid chromatography-mass spectrometry datasets.基于图的多液相色谱-质谱数据集的峰对齐算法。
Bioinformatics. 2013 Oct 1;29(19):2469-76. doi: 10.1093/bioinformatics/btt435. Epub 2013 Jul 30.
10
MassUntangler: a novel alignment tool for label-free liquid chromatography-mass spectrometry proteomic data.MassUntangler:一种用于无标记液相色谱-质谱蛋白质组学数据的新型对齐工具。
J Chromatogr A. 2011 Dec 9;1218(49):8859-68. doi: 10.1016/j.chroma.2011.06.062. Epub 2011 Jun 22.

本文引用的文献

1
Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions.贝叶斯推断用于单变量和多变量偏斜正态和偏斜 t 分布的有限混合。
Biostatistics. 2010 Apr;11(2):317-36. doi: 10.1093/biostatistics/kxp062. Epub 2010 Jan 27.
2
Automated high-dimensional flow cytometric data analysis.自动化高维流式细胞术数据分析。
Proc Natl Acad Sci U S A. 2009 May 26;106(21):8519-24. doi: 10.1073/pnas.0903028106. Epub 2009 May 14.
3
Model-based analysis of ChIP-Seq (MACS).基于模型的染色质免疫沉淀测序分析(MACS)
Genome Biol. 2008;9(9):R137. doi: 10.1186/gb-2008-9-9-r137. Epub 2008 Sep 17.
4
Genome-wide maps of chromatin state in pluripotent and lineage-committed cells.多能细胞和谱系定向细胞中染色质状态的全基因组图谱。
Nature. 2007 Aug 2;448(7153):553-60. doi: 10.1038/nature06008. Epub 2007 Jul 1.
5
Protein biomarker discovery and validation: the long and uncertain path to clinical utility.蛋白质生物标志物的发现与验证:通往临床应用的漫长且充满不确定性的道路。
Nat Biotechnol. 2006 Aug;24(8):971-83. doi: 10.1038/nbt1235.
6
PEPPeR, a platform for experimental proteomic pattern recognition.PEPPeR,一个用于实验性蛋白质组学模式识别的平台。
Mol Cell Proteomics. 2006 Oct;5(10):1927-41. doi: 10.1074/mcp.M600222-MCP200. Epub 2006 Jul 19.
7
A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS.一套使用高分辨率液相色谱-质谱联用技术对复杂蛋白质混合物进行综合分析的算法。
Bioinformatics. 2006 Aug 1;22(15):1902-9. doi: 10.1093/bioinformatics/btl276. Epub 2006 Jun 9.
8
Parts per million mass accuracy on an Orbitrap mass spectrometer via lock mass injection into a C-trap.通过向C阱中注入锁定质量,在Orbitrap质谱仪上实现百万分之一质量精度。
Mol Cell Proteomics. 2005 Dec;4(12):2010-21. doi: 10.1074/mcp.T500030-MCP200. Epub 2005 Oct 24.
9
Place of pattern in proteomic biomarker discovery.蛋白质组学生物标志物发现中模式的作用。
J Proteome Res. 2005 Jul-Aug;4(4):1143-54. doi: 10.1021/pr0500962.
10
Serum peptide profiling by magnetic particle-assisted, automated sample processing and MALDI-TOF mass spectrometry.通过磁珠辅助、自动化样品处理及基质辅助激光解吸电离飞行时间质谱法进行血清肽谱分析。
Anal Chem. 2004 Mar 15;76(6):1560-70. doi: 10.1021/ac0352171.