• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于关联的特征选择:从蛋白质组学图谱中识别生物标志物。

Guilt-by-association feature selection: identifying biomarkers from proteomic profiles.

作者信息

Shin Hyunjin, Sheu Bryan, Joseph Maria, Markey Mia K

机构信息

Department of Electrical and Computer Engineering, The University of Texas at Austin, USA.

出版信息

J Biomed Inform. 2008 Feb;41(1):124-36. doi: 10.1016/j.jbi.2007.04.003. Epub 2007 Apr 14.

DOI:10.1016/j.jbi.2007.04.003
PMID:17544868
Abstract

In recent years, proteomic profiling by mass spectrometry has opened up a new realm of methods for identifying potential biomarkers. Mass spectrometry data, like other proteomic and genomic data, are challenging to analyze because of their high dimensionality and the availability of few samples. Hence, feature selection is extremely important because it directly provides a list of potential biomarkers by choosing a subset of effective features that separate diseased samples from healthy ones. The rule of thumb for feature selection is that features must be discriminant and independent for the best separation of the two groups. However, in general, existing feature selection algorithms only take into account the discrimination ability of features. In this paper, we present a novel method for feature selection, guilt-by-association feature selection (GBA-FS). The algorithm makes it possible to select features that are independent as well as discriminant. After measuring similarities between features, the algorithm groups together similar features using a clustering algorithm, and selects the best representative feature from each group. As a result, it produces a list of discriminant and independent features. The efficacy of GBA-FS was extensively tested on two real-world SELDI TOF data sets. The experimental results demonstrate that GBA-FS assists in selecting more independent features as compared to a common filter type feature selection method, the t test. The results also show that GBA-FS can be used to deconvolve multiply charged states of the same protein molecules. As GBA-FS successfully identifies feature groups with similar mass values, it can also be employed as an alternative to peak detection for preprocessing the mass spectrometry data.

摘要

近年来,通过质谱进行蛋白质组分析开创了识别潜在生物标志物的新方法领域。与其他蛋白质组学和基因组学数据一样,质谱数据由于其高维度和样本数量少而难以分析。因此,特征选择极其重要,因为它通过选择将患病样本与健康样本区分开来的有效特征子集,直接提供潜在生物标志物列表。特征选择的经验法则是,为了最好地分离两组,特征必须具有判别性且相互独立。然而,一般来说,现有的特征选择算法只考虑特征的判别能力。在本文中,我们提出了一种新的特征选择方法——关联有罪特征选择(GBA-FS)。该算法能够选择既具有独立性又具有判别性的特征。在测量特征之间的相似性之后,该算法使用聚类算法将相似特征分组,并从每组中选择最佳代表性特征。结果,它生成了一个具有判别性和独立性的特征列表。我们在两个真实世界的SELDI TOF数据集上广泛测试了GBA-FS的有效性。实验结果表明,与普通的过滤型特征选择方法t检验相比,GBA-FS有助于选择更多独立特征。结果还表明,GBA-FS可用于解卷积同一蛋白质分子的多重电荷状态。由于GBA-FS成功识别了具有相似质量值的特征组,它也可以用作质谱数据预处理中峰检测的替代方法。

相似文献

1
Guilt-by-association feature selection: identifying biomarkers from proteomic profiles.基于关联的特征选择:从蛋白质组学图谱中识别生物标志物。
J Biomed Inform. 2008 Feb;41(1):124-36. doi: 10.1016/j.jbi.2007.04.003. Epub 2007 Apr 14.
2
A parsimonious threshold-independent protein feature selection method through the area under receiver operating characteristic curve.一种基于受试者工作特征曲线下面积的简约且与阈值无关的蛋白质特征选择方法。
Bioinformatics. 2007 Oct 15;23(20):2788-94. doi: 10.1093/bioinformatics/btm442. Epub 2007 Sep 18.
3
Independent component analysis for the extraction of reliable protein signal profiles from MALDI-TOF mass spectra.用于从基质辅助激光解吸电离飞行时间质谱中提取可靠蛋白质信号图谱的独立成分分析。
Bioinformatics. 2008 Jan 1;24(1):63-70. doi: 10.1093/bioinformatics/btm533. Epub 2007 Nov 14.
4
Proteomic data analysis workflow for discovery of candidate biomarker peaks predictive of clinical outcome for patients with acute myeloid leukemia.用于发现预测急性髓性白血病患者临床结局的候选生物标志物峰的蛋白质组学数据分析流程。
J Proteome Res. 2008 Jun;7(6):2332-41. doi: 10.1021/pr070482e. Epub 2008 May 2.
5
Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching.通过结合基于连续小波变换的模式匹配改进质谱中的峰检测。
Bioinformatics. 2006 Sep 1;22(17):2059-65. doi: 10.1093/bioinformatics/btl355. Epub 2006 Jul 4.
6
Algorithms for alignment of mass spectrometry proteomic data.用于质谱蛋白质组学数据比对的算法
Bioinformatics. 2005 Jul 15;21(14):3066-73. doi: 10.1093/bioinformatics/bti482. Epub 2005 May 6.
7
An extended Markov blanket approach to proteomic biomarker detection from high-resolution mass spectrometry data.一种基于扩展马尔可夫毯方法从高分辨率质谱数据中检测蛋白质组学生物标志物。
IEEE Trans Inf Technol Biomed. 2009 Mar;13(2):195-206. doi: 10.1109/TITB.2008.2007909. Epub 2008 Dec 31.
8
Improving feature detection and analysis of surface-enhanced laser desorption/ionization-time of flight mass spectra.改进表面增强激光解吸/电离飞行时间质谱的特征检测与分析。
Proteomics. 2005 Jul;5(11):2778-88. doi: 10.1002/pmic.200401184.
9
Serum protein profiling in patients with inflammatory bowel diseases using selective solid-phase bulk extraction, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry and chemometric data analysis.使用选择性固相批量提取、基质辅助激光解吸/电离飞行时间质谱和化学计量数据分析对炎症性肠病患者进行血清蛋白谱分析。
Rapid Commun Mass Spectrom. 2007;21(24):4142-8. doi: 10.1002/rcm.3323.
10
SELDI-TOF-based serum proteomic pattern diagnostics for early detection of cancer.基于表面增强激光解吸电离飞行时间质谱技术的血清蛋白质组图谱诊断用于癌症的早期检测
Curr Opin Biotechnol. 2004 Feb;15(1):24-30. doi: 10.1016/j.copbio.2004.01.005.

引用本文的文献

1
DUBStepR is a scalable correlation-based feature selection method for accurately clustering single-cell data.DUBStepR 是一种可扩展的基于相关性的特征选择方法,用于准确地对单细胞数据进行聚类。
Nat Commun. 2021 Oct 6;12(1):5849. doi: 10.1038/s41467-021-26085-2.
2
Estimation of start and stop numbers for cluster resolution feature selection algorithm: an empirical approach using null distribution analysis of Fisher ratios.用于聚类分辨率特征选择算法的起始和终止数字估计:一种使用费舍尔比率零分布分析的实证方法。
Anal Bioanal Chem. 2017 Nov;409(28):6699-6708. doi: 10.1007/s00216-017-0628-8. Epub 2017 Sep 29.
3
Serial analysis of 38 proteins during the progression of human breast tumor in mice using an antibody colocalization microarray.
使用抗体共定位微阵列对小鼠体内人乳腺肿瘤进展过程中的38种蛋白质进行系列分析。
Mol Cell Proteomics. 2015 Apr;14(4):1024-37. doi: 10.1074/mcp.M114.046516. Epub 2015 Feb 13.
4
Analysis of biological features associated with meiotic recombination hot and cold spots in Saccharomyces cerevisiae.分析与酿酒酵母减数分裂重组热点和冷点相关的生物学特征。
PLoS One. 2011;6(12):e29711. doi: 10.1371/journal.pone.0029711. Epub 2011 Dec 29.
5
Protein biomarkers of ovarian cancer: the forest and the trees.卵巢癌的蛋白质生物标志物:只见树木,不见森林。
Future Oncol. 2012 Jan;8(1):55-71. doi: 10.2217/fon.11.135.
6
Bioinformatic-driven search for metabolic biomarkers in disease.基于生物信息学的疾病代谢生物标志物搜索
J Clin Bioinforma. 2011 Jan 20;1(1):2. doi: 10.1186/2043-9113-1-2.
7
A data-mining approach to biomarker identification from protein profiles using discrete stationary wavelet transform.一种使用离散平稳小波变换从蛋白质谱中识别生物标志物的数据挖掘方法。
J Zhejiang Univ Sci B. 2008 Nov;9(11):863-70. doi: 10.1631/jzus.B0820163.