Suppr超能文献

基于统计学习的个体化家族性胰腺癌预防的蛋白质组学生物标志物发现。

Proteomics biomarker discovery for individualized prevention of familial pancreatic cancer using statistical learning.

机构信息

Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, Mainz, Germany.

Institute of Genetic Epidemiology, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg, Germany.

出版信息

PLoS One. 2023 Jan 26;18(1):e0280399. doi: 10.1371/journal.pone.0280399. eCollection 2023.

Abstract

BACKGROUND

The low five-year survival rate of pancreatic ductal adenocarcinoma (PDAC) and the low diagnostic rate of early-stage PDAC via imaging highlight the need to discover novel biomarkers and improve the current screening procedures for early diagnosis. Familial pancreatic cancer (FPC) describes the cases of PDAC that are present in two or more individuals within a circle of first-degree relatives. Using innovative high-throughput proteomics, we were able to quantify the protein profiles of individuals at risk from FPC families in different potential pre-cancer stages. However, the high-dimensional proteomics data structure challenges the use of traditional statistical analysis tools. Hence, we applied advanced statistical learning methods to enhance the analysis and improve the results' interpretability.

METHODS

We applied model-based gradient boosting and adaptive lasso to deal with the small, unbalanced study design via simultaneous variable selection and model fitting. In addition, we used stability selection to identify a stable subset of selected biomarkers and, as a result, obtain even more interpretable results. In each step, we compared the performance of the different analytical pipelines and validated our approaches via simulation scenarios.

RESULTS

In the simulation study, model-based gradient boosting showed a more accurate prediction performance in the small, unbalanced, and high-dimensional datasets than adaptive lasso and could identify more relevant variables. Furthermore, using model-based gradient boosting, we discovered a subset of promising serum biomarkers that may potentially improve the current screening procedure of FPC.

CONCLUSION

Advanced statistical learning methods helped us overcome the shortcomings of an unbalanced study design in a valuable clinical dataset. The discovered serum biomarkers provide us with a clear direction for further investigations and more precise clinical hypotheses regarding the development of FPC and optimal strategies for its early detection.

摘要

背景

胰腺导管腺癌(PDAC)五年生存率低,影像学对早期 PDAC 的诊断率低,这突出表明需要发现新的生物标志物,并改进目前的筛查程序以进行早期诊断。家族性胰腺癌(FPC)描述的是在一级亲属的范围内有两个或更多个体存在 PDAC 的情况。使用创新的高通量蛋白质组学,我们能够对处于不同潜在癌前阶段的 FPC 家族风险个体的蛋白质谱进行定量。然而,高维蛋白质组学数据结构对传统统计分析工具的使用提出了挑战。因此,我们应用了先进的统计学习方法来增强分析并提高结果的可解释性。

方法

我们应用基于模型的梯度提升和自适应套索来处理通过同时变量选择和模型拟合来处理小的、不平衡的研究设计。此外,我们使用稳定性选择来识别所选生物标志物的稳定子集,并因此获得更具可解释性的结果。在每个步骤中,我们比较了不同分析管道的性能,并通过模拟场景验证了我们的方法。

结果

在模拟研究中,基于模型的梯度提升在小的、不平衡的和高维数据集上表现出比自适应套索更准确的预测性能,并且能够识别更多相关变量。此外,使用基于模型的梯度提升,我们发现了一组有前途的血清生物标志物,这些生物标志物可能有潜力改善 FPC 的当前筛查程序。

结论

先进的统计学习方法帮助我们克服了有价值的临床数据集中不平衡研究设计的缺点。所发现的血清生物标志物为我们进一步研究以及关于 FPC 发展和其早期检测的最佳策略的更精确临床假设提供了明确的方向。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c588/9879447/6f42865e37ec/pone.0280399.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验