• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Tackling the small imbalanced horizontal dataset regressions by Stability Selection and SMOGN: a case study of ventilation-free days prediction in the pediatric intensive care unit and the importance of PRISM.通过稳定性选择和SMOGN处理小型不平衡水平数据集回归:以儿科重症监护病房无通气天数预测为例及PRISM的重要性
Int J Med Inform. 2025 Apr;196:105809. doi: 10.1016/j.ijmedinf.2025.105809. Epub 2025 Jan 25.
2
Improved pediatric ICU mortality prediction for respiratory diseases: machine learning and data subdivision insights.改进儿科 ICU 呼吸系统疾病死亡率预测:机器学习和数据细分洞察。
Respir Res. 2024 May 23;25(1):216. doi: 10.1186/s12931-024-02753-x.
3
Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.将机器学习中的手工特征与潜在变量相结合,以预测放射性肺损伤。
Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.
4
Fluid overload in children with pediatric acute respiratory distress syndrome: A retrospective cohort study.儿童急性呼吸窘迫综合征液体超负荷:一项回顾性队列研究。
Pediatr Pulmonol. 2022 Jan;57(1):300-307. doi: 10.1002/ppul.25720. Epub 2021 Oct 14.
5
DBCSMOTE: a clustering-based oversampling technique for data-imbalanced warfarin dose prediction.DBCSMOTE:一种基于聚类的过采样技术,用于数据不平衡的华法林剂量预测。
BMC Med Genomics. 2020 Oct 22;13(Suppl 10):152. doi: 10.1186/s12920-020-00781-2.
6
RSMOTE: improving classification performance over imbalanced medical datasets.RSMOTE:提升不平衡医学数据集的分类性能
Health Inf Sci Syst. 2020 Jun 12;8(1):22. doi: 10.1007/s13755-020-00112-w. eCollection 2020 Dec.
7
Structure-activity relationship-based chemical classification of highly imbalanced Tox21 datasets.基于结构-活性关系的高度不平衡Tox21数据集的化学分类
J Cheminform. 2020 Oct 27;12(1):66. doi: 10.1186/s13321-020-00468-x.
8
Mechanical power in pediatric acute respiratory distress syndrome: a PARDIE study.小儿急性呼吸窘迫综合征的机械通气策略:PARDIE 研究。
Crit Care. 2022 Jan 3;26(1):2. doi: 10.1186/s13054-021-03853-6.
9
Class prediction for high-dimensional class-imbalanced data.高维类别不平衡数据的类别预测。
BMC Bioinformatics. 2010 Oct 20;11:523. doi: 10.1186/1471-2105-11-523.
10
Survival prediction from imbalanced colorectal cancer dataset using hybrid sampling methods and tree-based classifiers.使用混合采样方法和基于树的分类器对不均衡结直肠癌数据集进行生存预测。
Sci Rep. 2025 Apr 25;15(1):14554. doi: 10.1038/s41598-025-98703-8.

本文引用的文献

1
Improving mixed-integer temporal modeling by generating synthetic data using conditional generative adversarial networks: A case study of fluid overload prediction in the intensive care unit.利用条件生成对抗网络生成合成数据来改进混合整数时间建模:以重症监护病房中液体超负荷预测为例的研究。
Comput Biol Med. 2024 Jan;168:107749. doi: 10.1016/j.compbiomed.2023.107749. Epub 2023 Nov 22.
2
Identification and development of a five-gene signature to improve the prediction of mechanical ventilator-free days for patients with COVID-19.鉴定和开发 5 个基因标志物,以改善对 COVID-19 患者脱机率的预测。
Eur Rev Med Pharmacol Sci. 2023 Jan;27(2):805-817. doi: 10.26355/eurrev_202301_31082.
3
Cluster analysis and profiling of airway fluid metabolites in pediatric acute hypoxemic respiratory failure.对儿童急性低氧性呼吸衰竭气道液代谢物进行聚类分析和分析。
Sci Rep. 2021 Nov 26;11(1):23019. doi: 10.1038/s41598-021-02354-4.
4
Large scale cytokine profiling uncovers elevated IL12-p70 and IL-17A in severe pediatric acute respiratory distress syndrome.大规模细胞因子分析揭示严重小儿急性呼吸窘迫综合征中 IL12-p70 和 IL-17A 的升高。
Sci Rep. 2021 Jul 8;11(1):14158. doi: 10.1038/s41598-021-93705-8.
5
Machine Learning-Based Discovery of a Gene Expression Signature in Pediatric Acute Respiratory Distress Syndrome.基于机器学习发现小儿急性呼吸窘迫综合征的基因表达特征
Crit Care Explor. 2021 Jun 15;3(6):e0431. doi: 10.1097/CCE.0000000000000431. eCollection 2021 Jun.
6
Effects of Methylprednisolone on Ventilator-Free Days in Mechanically Ventilated Patients with Acute Respiratory Distress Syndrome and COVID-19: A Retrospective Study.甲泼尼龙对急性呼吸窘迫综合征合并新型冠状病毒肺炎机械通气患者无呼吸机天数的影响:一项回顾性研究
J Clin Med. 2021 Feb 14;10(4):760. doi: 10.3390/jcm10040760.
7
The Current State of Pediatric Acute Respiratory Distress Syndrome.小儿急性呼吸窘迫综合征的现状
Pediatr Allergy Immunol Pulmonol. 2019 Jun 1;32(2):35-44. doi: 10.1089/ped.2019.0999. Epub 2019 Jun 17.
8
Reappraisal of Ventilator-Free Days in Critical Care Research.重新评估重症监护研究中的无呼吸机天数。
Am J Respir Crit Care Med. 2019 Oct 1;200(7):828-836. doi: 10.1164/rccm.201810-2050CP.
9
Effect of a Low vs Intermediate Tidal Volume Strategy on Ventilator-Free Days in Intensive Care Unit Patients Without ARDS: A Randomized Clinical Trial.低潮气量与中潮气量策略对无急性呼吸窘迫综合征的 ICU 患者呼吸机使用天数的影响:一项随机临床试验。
JAMA. 2018 Nov 13;320(18):1872-1880. doi: 10.1001/jama.2018.14280.
10
Interleukin-1 Receptor Antagonist and Interleukin-1β: Risk Marker or Risk Factor for Pediatric Acute Respiratory Distress Syndrome?
Pediatr Crit Care Med. 2018 Oct;19(10):993-995. doi: 10.1097/PCC.0000000000001713.

通过稳定性选择和SMOGN处理小型不平衡水平数据集回归:以儿科重症监护病房无通气天数预测为例及PRISM的重要性

Tackling the small imbalanced horizontal dataset regressions by Stability Selection and SMOGN: a case study of ventilation-free days prediction in the pediatric intensive care unit and the importance of PRISM.

作者信息

Rad Milad, Rafiei Alireza, Grunwell Jocelyn, Kamaleswaran Rishikesan

机构信息

School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA.

Department of Computer Science and Informatics, Emory University, Atlanta, GA, USA.

出版信息

Int J Med Inform. 2025 Apr;196:105809. doi: 10.1016/j.ijmedinf.2025.105809. Epub 2025 Jan 25.

DOI:10.1016/j.ijmedinf.2025.105809
PMID:39893765
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11867836/
Abstract

OBJECTIVE

The regression of small imbalanced horizontal datasets is an important problem in bioinformatics due to rare but vital data points impacting model performance. Most clinical studies suffer from imbalance in their distribution which impacts the learning ability of regression or classification models. The imbalance once combined with the small number of samples reduces the prediction performance. An improvement in the trainability of small imbalanced datasets hugely improves the potency of current prediction models that rely on a small set of valuable expensive samples.

MATERIALS AND METHODS

A method called Stability Selection has been used to overcome the high dimensionality problem, which arises when the sample sizes are relatively small compared to the number of features. The method was used to improve the performance of the Synthetic Minority Over-Sampling Technique for Regression with Gaussian Noise (SMOGN), an imbalance removal algorithm. To test the new pipeline, a small imbalanced cohort of pediatric ICU patients was used to predict the number of Ventilator-Free Days (VFD) a patient may experience for an admission period of 28 days due to respiratory illnesses.

RESULTS

Our model demonstrated its effectiveness by overcoming label imbalance while predicting almost all the non-surviving patients in the test dataset using Stability Selection before applying SMOGN. Our study also highlighted the importance of Pediatrics Risk of Mortality (PRISM) as a powerful VFD predictor if combined with other clinical features.

CONCLUSION

This paper shows how a hybrid strategy of Stability Selection, SMOGN, and regression can improve the outcome of highly imbalanced datasets and reduce the probability of highly expensive false negative detections in severe acute respiratory disease syndrome cases. The proposed modeling pipeline can reduce the overall VFD regression error but is also expandable to other regressable features. We also showed the importance of PRISM as a strong VFD predictor.

摘要

目的

由于罕见但关键的数据点会影响模型性能,小型不平衡水平数据集的回归是生物信息学中的一个重要问题。大多数临床研究存在分布不平衡的问题,这会影响回归或分类模型的学习能力。不平衡一旦与少量样本相结合,就会降低预测性能。提高小型不平衡数据集的可训练性,能极大地提升当前依赖少量有价值的昂贵样本的预测模型的效能。

材料与方法

一种名为稳定性选择的方法被用于克服高维问题,当样本量与特征数量相比相对较小时会出现该问题。该方法用于提高带有高斯噪声的回归合成少数过采样技术(SMOGN)的性能,这是一种不平衡消除算法。为测试新流程,使用了一个小型不平衡的儿科重症监护病房患者队列,来预测因呼吸系统疾病入院28天期间患者可能经历的无呼吸机天数(VFD)。

结果

在应用SMOGN之前,我们的模型通过使用稳定性选择克服标签不平衡,同时在测试数据集中几乎预测出所有非存活患者,证明了其有效性。我们的研究还强调了如果与其他临床特征相结合,儿科死亡风险(PRISM)作为强大的VFD预测指标的重要性。

结论

本文展示了稳定性选择、SMOGN和回归的混合策略如何能改善高度不平衡数据集的结果,并降低严重急性呼吸综合征病例中高成本假阴性检测的概率。所提出的建模流程可以降低总体VFD回归误差,而且还可扩展到其他可回归特征。我们还展示了PRISM作为强大的VFD预测指标的重要性。