• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过正则化ROC曲线下面积对糖组学质谱数据进行分析。

On the analysis of glycomics mass spectrometry data via the regularized area under the ROC curve.

作者信息

Ye Jingjing, Liu Hao, Kirmiz Crystal, Lebrilla Carlito B, Rocke David M

机构信息

Department of Statistics, University of California, Davis, Davis, CA, 95616, USA.

出版信息

BMC Bioinformatics. 2007 Dec 12;8:477. doi: 10.1186/1471-2105-8-477.

DOI:10.1186/1471-2105-8-477
PMID:18076765
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2211327/
Abstract

BACKGROUND

Novel molecular and statistical methods are in rising demand for disease diagnosis and prognosis with the help of recent advanced biotechnology. High-resolution mass spectrometry (MS) is one of those biotechnologies that are highly promising to improve health outcome. Previous literatures have identified some proteomics biomarkers that can distinguish healthy patients from cancer patients using MS data. In this paper, an MS study is demonstrated which uses glycomics to identify ovarian cancer. Glycomics is the study of glycans and glycoproteins. The glycans on the proteins may deviate between a cancer cell and a normal cell and may be visible in the blood. High-resolution MS has been applied to measure relative abundances of potential glycan biomarkers in human serum. Multiple potential glycan biomarkers are measured in MS spectra. With the objection of maximizing the empirical area under the ROC curve (AUC), an analysis method was considered which combines potential glycan biomarkers for the diagnosis of cancer.

RESULTS

Maximizing the empirical AUC of glycomics MS data is a large-dimensional optimization problem. The technical difficulty is that the empirical AUC function is not continuous. Instead, it is in fact an empirical 0-1 loss function with a large number of linear predictors. An approach was investigated that regularizes the area under the ROC curve while replacing the 0-1 loss function with a smooth surrogate function. The constrained threshold gradient descent regularization algorithm was applied, where the regularization parameters were chosen by the cross-validation method, and the confidence intervals of the regression parameters were estimated by the bootstrap method. The method is called TGDR-AUC algorithm. The properties of the approach were studied through a numerical simulation study, which incorporates the positive values of mass spectrometry data with the correlations between measurements within person. The simulation proved asymptotic properties that estimated AUC approaches the true AUC. Finally, mass spectrometry data of serum glycan for ovarian cancer diagnosis was analyzed. The optimal combination based on TGDR-AUC algorithm yields plausible result and the detected biomarkers are confirmed based on biological evidence.

CONCLUSION

The TGDR-AUC algorithm relaxes the normality and independence assumptions from previous literatures. In addition to its flexibility and easy interpretability, the algorithm yields good performance in combining potential biomarkers and is computationally feasible. Thus, the approach of TGDR-AUC is a plausible algorithm to classify disease status on the basis of multiple biomarkers.

摘要

背景

借助近期先进的生物技术,新型分子和统计方法在疾病诊断和预后方面的需求不断增加。高分辨率质谱(MS)是那些极有希望改善健康结果的生物技术之一。先前的文献已经鉴定出一些蛋白质组学生物标志物,可利用质谱数据区分健康患者和癌症患者。本文展示了一项利用糖组学鉴定卵巢癌的质谱研究。糖组学是对聚糖和糖蛋白的研究。蛋白质上的聚糖在癌细胞和正常细胞之间可能会有所不同,并且可能在血液中可见。高分辨率质谱已被用于测量人血清中潜在聚糖生物标志物的相对丰度。在质谱图中测量多种潜在的聚糖生物标志物。为了最大化经验性受试者工作特征曲线下面积(AUC),考虑了一种结合潜在聚糖生物标志物进行癌症诊断的分析方法。

结果

最大化糖组学质谱数据的经验性AUC是一个高维优化问题。技术难点在于经验性AUC函数不连续。实际上,它是一个具有大量线性预测变量的经验性0 - 1损失函数。研究了一种方法,该方法在使用平滑替代函数代替0 - 1损失函数的同时,对受试者工作特征曲线下面积进行正则化。应用了约束阈值梯度下降正则化算法,其中正则化参数通过交叉验证方法选择,回归参数的置信区间通过自助法估计。该方法称为TGDR - AUC算法。通过数值模拟研究对该方法的性质进行了研究,该研究将质谱数据的正值与个体内测量值之间的相关性结合起来。模拟证明了估计的AUC接近真实AUC的渐近性质。最后,对用于卵巢癌诊断的血清聚糖质谱数据进行了分析。基于TGDR - AUC算法的最佳组合产生了合理的结果,并且基于生物学证据对检测到的生物标志物进行了确认。

结论

TGDR - AUC算法放宽了先前文献中的正态性和独立性假设。除了其灵活性和易于解释性之外,该算法在组合潜在生物标志物方面表现良好且计算可行。因此,TGDR - AUC方法是一种基于多种生物标志物对疾病状态进行分类的合理算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a32/2211327/551030007c64/1471-2105-8-477-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a32/2211327/16397e57e6fe/1471-2105-8-477-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a32/2211327/9765d6841ceb/1471-2105-8-477-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a32/2211327/a2453ff7c5c1/1471-2105-8-477-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a32/2211327/d02dcdb37966/1471-2105-8-477-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a32/2211327/792354b2fc38/1471-2105-8-477-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a32/2211327/0c8fafa5f115/1471-2105-8-477-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a32/2211327/551030007c64/1471-2105-8-477-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a32/2211327/16397e57e6fe/1471-2105-8-477-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a32/2211327/9765d6841ceb/1471-2105-8-477-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a32/2211327/a2453ff7c5c1/1471-2105-8-477-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a32/2211327/d02dcdb37966/1471-2105-8-477-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a32/2211327/792354b2fc38/1471-2105-8-477-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a32/2211327/0c8fafa5f115/1471-2105-8-477-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a32/2211327/551030007c64/1471-2105-8-477-7.jpg

相似文献

1
On the analysis of glycomics mass spectrometry data via the regularized area under the ROC curve.通过正则化ROC曲线下面积对糖组学质谱数据进行分析。
BMC Bioinformatics. 2007 Dec 12;8:477. doi: 10.1186/1471-2105-8-477.
2
Regularized binormal ROC method in disease classification using microarray data.使用微阵列数据进行疾病分类的正则化双法线ROC方法。
BMC Bioinformatics. 2006 May 9;7:253. doi: 10.1186/1471-2105-7-253.
3
The glycolyzer: automated glycan annotation software for high performance mass spectrometry and its application to ovarian cancer glycan biomarker discovery.糖基化分析软件:用于高性能质谱分析的自动化糖基化注释软件及其在卵巢癌糖基化生物标志物发现中的应用。
Proteomics. 2012 Aug;12(15-16):2523-38. doi: 10.1002/pmic.201100273.
4
Glycomics analysis of serum: a potential new biomarker for ovarian cancer?血清糖组学分析:卵巢癌潜在的新型生物标志物?
Int J Gynecol Cancer. 2008 May-Jun;18(3):470-5. doi: 10.1111/j.1525-1438.2007.01028.x. Epub 2007 Jul 26.
5
A parsimonious threshold-independent protein feature selection method through the area under receiver operating characteristic curve.一种基于受试者工作特征曲线下面积的简约且与阈值无关的蛋白质特征选择方法。
Bioinformatics. 2007 Oct 15;23(20):2788-94. doi: 10.1093/bioinformatics/btm442. Epub 2007 Sep 18.
6
Quantitative O-glycomics based on improvement of the one-pot method for nonreductive O-glycan release and simultaneous stable isotope labeling with 1-(d/d)phenyl-3-methyl-5-pyrazolone followed by mass spectrometric analysis.基于改进一锅法的定量O-糖组学,该方法用于非还原O-聚糖释放及同时用1-(氘/氘代)phenyl-3-methyl-5-pyrazolone进行稳定同位素标记,随后进行质谱分析。
J Proteomics. 2017 Jan 6;150:18-30. doi: 10.1016/j.jprot.2016.08.012. Epub 2016 Aug 29.
7
Isomer-specific chromatographic profiling yields highly sensitive and specific potential N-glycan biomarkers for epithelial ovarian cancer.对映异构体特异性色谱分析可产生高度敏感和特异的上皮性卵巢癌潜在 N-糖链生物标志物。
J Chromatogr A. 2013 Mar 1;1279:58-67. doi: 10.1016/j.chroma.2012.12.079. Epub 2013 Jan 11.
8
Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data.使用质谱数据进行卵巢癌分类的统计方法比较
Bioinformatics. 2003 Sep 1;19(13):1636-43. doi: 10.1093/bioinformatics/btg210.
9
Combining multiple markers for classification using ROC.使用ROC曲线结合多个标记物进行分类。
Biometrics. 2007 Sep;63(3):751-7. doi: 10.1111/j.1541-0420.2006.00731.x.
10
A Machine Learning Based Approach to de novo Sequencing of Glycans from Tandem Mass Spectrometry Spectrum.一种基于机器学习的从串联质谱谱图中进行聚糖从头测序的方法。
IEEE/ACM Trans Comput Biol Bioinform. 2015 Nov-Dec;12(6):1267-74. doi: 10.1109/TCBB.2015.2430317.

引用本文的文献

1
AUC-based biomarker ensemble with an application on gene scores predicting low bone mineral density.基于 AUC 的生物标志物组合在基因评分预测低骨密度中的应用。
Bioinformatics. 2011 Nov 1;27(21):3050-5. doi: 10.1093/bioinformatics/btr516. Epub 2011 Sep 9.
2
The development of retrosynthetic glycan libraries to profile and classify the human serum N-linked glycome.用于分析和分类人血清 N 连接聚糖组的逆合成聚糖文库的开发。
Proteomics. 2009 Jun;9(11):2986-94. doi: 10.1002/pmic.200800760.

本文引用的文献

1
Identification of biomarkers from mass spectrometry data using a "common" peak approach.使用“通用”峰方法从质谱数据中鉴定生物标志物。
BMC Bioinformatics. 2006 Jul 26;7:358. doi: 10.1186/1471-2105-7-358.
2
Profiling of glycans in serum for the discovery of potential biomarkers for ovarian cancer.分析血清中的聚糖以发现卵巢癌的潜在生物标志物。
J Proteome Res. 2006 Jul;5(7):1626-35. doi: 10.1021/pr060010k.
3
Combining predictors for classification using the area under the receiver operating characteristic curve.使用受试者工作特征曲线下面积来组合预测因子进行分类。
Biometrics. 2006 Mar;62(1):221-9. doi: 10.1111/j.1541-0420.2005.00420.x.
4
Regularized ROC method for disease classification and biomarker selection with microarray data.用于基于微阵列数据的疾病分类和生物标志物选择的正则化ROC方法。
Bioinformatics. 2005 Dec 15;21(24):4356-62. doi: 10.1093/bioinformatics/bti724. Epub 2005 Oct 18.
5
Proteomic mass spectra classification using decision tree based ensemble methods.使用基于决策树的集成方法进行蛋白质组质谱分类。
Bioinformatics. 2005 Jul 15;21(14):3138-45. doi: 10.1093/bioinformatics/bti494. Epub 2005 May 12.
6
Threshold gradient descent method for censored data regression with applications in pharmacogenomics.用于删失数据回归的阈值梯度下降法及其在药物基因组学中的应用
Pac Symp Biocomput. 2005:272-83. doi: 10.1142/9789812702456_0026.
7
Probabilistic disease classification of expression-dependent proteomic data from mass spectrometry of human serum.基于人血清质谱的表达依赖性蛋白质组学数据的概率性疾病分类
J Comput Biol. 2003;10(6):925-46. doi: 10.1089/106652703322756159.
8
Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments.血清中表面增强激光解吸电离飞行时间质谱蛋白质谱图的可重复性:比较不同实验的数据集
Bioinformatics. 2004 Mar 22;20(5):777-85. doi: 10.1093/bioinformatics/btg484. Epub 2004 Jan 29.
9
A prototype methodology combining surface-enhanced laser desorption/ionization protein chip technology and artificial neural network algorithms to predict the chemoresponsiveness of breast cancer cell lines exposed to Paclitaxel and Doxorubicin under in vitro conditions.一种结合表面增强激光解吸/电离蛋白质芯片技术和人工神经网络算法的原型方法,用于预测体外条件下暴露于紫杉醇和阿霉素的乳腺癌细胞系的化疗反应性。
Proteomics. 2003 Sep;3(9):1725-37. doi: 10.1002/pmic.200300526.
10
Protocols for disease classification from mass spectrometry data.基于质谱数据的疾病分类方案。
Proteomics. 2003 Sep;3(9):1692-8. doi: 10.1002/pmic.200300519.