Manufacturing and Technology Division, Bertis Inc., Hungdeok 1-Ro, Giheung-Gu, Yongin-Si, Gyeonggi-Do, 16954, Republic of Korea.
Bio Convergence Research Institute, Bertis Inc., Heungdeok 1-Ro, Giheung-Gu, Yongin-Si, Gyeonggi-Do, 16954, Republic of Korea.
Sci Rep. 2023 Jun 2;13(1):8991. doi: 10.1038/s41598-023-36159-4.
Mass spectrometry (MS) based proteomics is widely used for biomarker discovery. However, often, most biomarker candidates from discovery are discarded during the validation processes. Such discrepancies between biomarker discovery and validation are caused by several factors, mainly due to the differences in analytical methodology and experimental conditions. Here, we generated a peptide library which allows discovery of biomarkers in the equal settings as the validation process, thereby making the transition from discovery to validation more robust and efficient. The peptide library initiated with a list of 3393 proteins detectable in the blood from public databases. For each protein, surrogate peptides favorable for detection in mass spectrometry was selected and synthesized. A total of 4683 synthesized peptides were spiked into neat serum and plasma samples to check their quantifiability in a 10 min liquid chromatography-MS/MS run time. This led to the PepQuant library, which is composed of 852 quantifiable peptides that cover 452 human blood proteins. Using the PepQuant library, we discovered 30 candidate biomarkers for breast cancer. Among the 30 candidates, nine biomarkers, FN1, VWF, PRG4, MMP9, CLU, PRDX6, PPBP, APOC1, and CHL1 were validated. By combining the quantification values of these markers, we generated a machine learning model predicting breast cancer, showing an average area under the curve of 0.9105 for the receiver operating characteristic curve.
基于质谱(MS)的蛋白质组学广泛用于生物标志物的发现。然而,在验证过程中,通常会丢弃发现的大多数候选生物标志物。在发现和验证之间出现这种差异是由几个因素引起的,主要是由于分析方法和实验条件的差异。在这里,我们生成了一个肽库,允许在与验证过程相同的设置中发现生物标志物,从而使从发现到验证的过渡更加稳健和高效。肽库从公共数据库中可检测到的 3393 种蛋白质列表开始。对于每种蛋白质,都选择了有利于在质谱中检测的替代肽并进行了合成。总共合成了 4683 种肽,将其混入纯血清和血浆样本中,以检查它们在 10 分钟的液相色谱-MS/MS 运行时间内的可定量性。这导致了 PepQuant 文库的产生,其中包含 852 种可定量的肽,覆盖了 452 种人类血液蛋白质。使用 PepQuant 文库,我们发现了 30 种乳腺癌候选生物标志物。在 30 个候选物中,FN1、VWF、PRG4、MMP9、CLU、PRDX6、PPBP、APOC1 和 CHL1 这 9 个标志物得到了验证。通过组合这些标志物的定量值,我们生成了一个预测乳腺癌的机器学习模型,ROC 曲线的平均 AUC 为 0.9105。