Suppr超能文献

多队列研究检验 SASKit-ML 中风和 PDAC 预后模型管道在其他慢性疾病中的泛化能力。

Multicohort study testing the generalisability of the SASKit-ML stroke and PDAC prognostic model pipeline to other chronic diseases.

机构信息

Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock University Medical Center, Rostock, Germany.

Department of Medicine, Clinic III, Hematology, Oncology, Palliative Medicine, Rostock University Medical Center, Rostock, Germany.

出版信息

BMJ Open. 2024 Sep 30;14(9):e088181. doi: 10.1136/bmjopen-2024-088181.

Abstract

OBJECTIVES

To validate and test the generalisability of the SASKit-ML pipeline, a prepublished feature selection and machine learning pipeline for the prediction of health deterioration after a stroke or pancreatic adenocarcinoma event, by using it to identify biomarkers of health deterioration in chronic disease.

DESIGN

This is a validation study using a predefined protocol applied to multiple publicly available datasets, including longitudinal data from cohorts with type 2 diabetes (T2D), inflammatory bowel disease (IBD), rheumatoid arthritis (RA) and various cancers. The datasets were chosen to mimic as closely as possible the SASKit cohort, a prospective, longitudinal cohort study.

DATA SOURCES

Public data were used from the T2D (77 patients with potential pre-diabetes and 18 controls) and IBD (49 patients with IBD and 12 controls) branches of the Human Microbiome Project (HMP), RA Map (RA-MAP, 92 patients with RA, 22 controls) and The Cancer Genome Atlas (TCGA, 16 cancers).

METHODS

Data integration steps were performed in accordance with the prepublished study protocol, generating features to predict disease outcomes using 10-fold cross-validated random survival forests.

OUTCOME MEASURES

Health deterioration was assessed using disease-specific clinical markers and endpoints across different cohorts. In the HMP-T2D cohort, the worsening of glycated haemoglobin (HbA1c) levels (5.7% or more HbA1c in the blood), fasting plasma glucose (at least 100 mg/dL) and oral glucose tolerance test (at least 140) results were considered. For the HMP-IBD cohort, a worsening by at least 3 points of a disease-specific severity measure, the "Simple Clinical Colitis Activity Index" or "Harvey-Bradshaw Index" indicated an event. For the RA-MAP cohort, the outcome was defined as the worsening of the "Disease Activity Score 28" or "Simple Disease Activity Index" by at least five points, or the worsening of the "Health Assessment Questionnaire" score or an increase in the number of swollen/tender joints were evaluated. Finally, the outcome for all TCGA datasets was the progression-free interval.

RESULTS

Models for the prediction of health deterioration in T2D, IBD, RA and 16 cancers were produced. The T2D (C-index of 0.633 and Integrated Brier Score (IBS) of 0.107) and the RA (C-index of 0.654 and IBS of 0.150) models were modestly predictive. The IBD model was uninformative. TCGA models tended towards modest predictive power.

CONCLUSIONS

The SASKit-ML pipeline produces informative and useful features with the power to predict health deterioration in a variety of diseases and cancers; however, this performance is disease-dependent.

摘要

目的

通过使用 SASKit-ML 管道(一种用于预测中风或胰腺腺癌事件后健康恶化的预发表特征选择和机器学习管道)来识别慢性病中健康恶化的生物标志物,验证和测试该管道的泛化能力。

设计

这是一项使用预定义方案进行的验证研究,该方案适用于多个公开可用的数据集,包括来自 2 型糖尿病(T2D)、炎症性肠病(IBD)、类风湿关节炎(RA)和各种癌症队列的纵向数据。选择这些数据集是为了尽可能模拟 SASKit 队列,这是一项前瞻性、纵向队列研究。

数据来源

使用来自人类微生物组计划(HMP)的 T2D(77 名有潜在糖尿病前期和 18 名对照者)和 IBD(49 名 IBD 患者和 12 名对照者)分支、RA Map(RA-MAP,92 名 RA 患者和 22 名对照者)和癌症基因组图谱(TCGA,16 种癌症)的公共数据。

方法

按照预发表的研究方案进行数据集成步骤,使用 10 折交叉验证随机生存森林生成预测疾病结局的特征。

结果

为 T2D、IBD、RA 和 16 种癌症生成了预测健康恶化的模型。T2D(C 指数为 0.633,综合 Brier 评分(IBS)为 0.107)和 RA(C 指数为 0.654,IBS 为 0.150)模型的预测能力适中。IBD 模型没有提供信息。TCGA 模型的预测能力倾向于适中。

结论

SASKit-ML 管道可生成具有预测多种疾病和癌症健康恶化能力的信息丰富且有用的特征;然而,这种性能是依赖于疾病的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29d3/11448215/3dd86d6b3492/bmjopen-14-9-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验