• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

A computational pipeline for data augmentation towards the improvement of disease classification and risk stratification models: A case study in two clinical domains.

作者信息

Pezoulas Vasileios C, Grigoriadis Grigoris I, Gkois George, Tachos Nikolaos S, Smole Tim, Bosnić Zoran, Pičulin Matej, Olivotto Iacopo, Barlocco Fausto, Robnik-Šikonja Marko, Jakovljevic Djordje G, Goules Andreas, Tzioufas Athanasios G, Fotiadis Dimitrios I

机构信息

Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Ioannina, GR45110, Greece.

Faculty of Computer and Information Science, University of Ljubljana, Večna Pot 113, 1000, Ljubljana, Slovenia.

出版信息

Comput Biol Med. 2021 Jul;134:104520. doi: 10.1016/j.compbiomed.2021.104520. Epub 2021 Jun 6.

DOI:10.1016/j.compbiomed.2021.104520
PMID:34118751
Abstract

Virtual population generation is an emerging field in data science with numerous applications in healthcare towards the augmentation of clinical research databases with significant lack of population size. However, the impact of data augmentation on the development of AI (artificial intelligence) models to address clinical unmet needs has not yet been investigated. In this work, we assess whether the aggregation of real with virtual patient data can improve the performance of the existing risk stratification and disease classification models in two rare clinical domains, namely the primary Sjögren's Syndrome (pSS) and the hypertrophic cardiomyopathy (HCM), for the first time in the literature. To do so, multivariate approaches, such as, the multivariate normal distribution (MVND), and straightforward ones, such as, the Bayesian networks, the artificial neural networks (ANNs), and the tree ensembles are compared against their performance towards the generation of high-quality virtual data. Both boosting and bagging algorithms, such as, the Gradient boosting trees (XGBoost), the AdaBoost and the Random Forests (RFs) were trained on the augmented data to evaluate the performance improvement for lymphoma classification and HCM risk stratification. Our results revealed the favorable performance of the tree ensemble generators, in both domains, yielding virtual data with goodness-of-fit 0.021 and KL-divergence 0.029 in pSS and 0.029, 0.027 in HCM, respectively. The application of the XGBoost on the augmented data revealed an increase by 10.9% in accuracy, 10.7% in sensitivity, 11.5% in specificity for lymphoma classification and 16.1% in accuracy, 16.9% in sensitivity, 13.7% in specificity in HCM risk stratification.

摘要

相似文献

1
A computational pipeline for data augmentation towards the improvement of disease classification and risk stratification models: A case study in two clinical domains.
Comput Biol Med. 2021 Jul;134:104520. doi: 10.1016/j.compbiomed.2021.104520. Epub 2021 Jun 6.
2
Variational Gaussian Mixture Models with robust Dirichlet concentration priors for virtual population generation in hypertrophic cardiomyopathy: a comparison study.基于稳健 Dirichlet 浓度先验的变分高斯混合模型在肥厚型心肌病虚拟人群生成中的比较研究。
Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:1674-1677. doi: 10.1109/EMBC46164.2021.9629653.
3
A federated AI strategy for the classification of patients with Mucosa Associated Lymphoma Tissue (MALT) lymphoma across multiple harmonized cohorts.多组学队列中黏膜相关淋巴组织(MALT)淋巴瘤患者分类的联邦人工智能策略。
Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:1666-1669. doi: 10.1109/EMBC46164.2021.9630014.
4
Classification of genomic islands using decision trees and their ensemble algorithms.基于决策树及其集成算法的基因组岛分类。
BMC Genomics. 2010 Nov 2;11 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2164-11-S2-S1.
5
Learning ensembles of neural networks by means of a Bayesian artificial immune system.借助贝叶斯人工免疫系统学习神经网络集成
IEEE Trans Neural Netw. 2011 Feb;22(2):304-16. doi: 10.1109/TNN.2010.2096823. Epub 2010 Dec 23.
6
A Theoretical Analysis of Why Hybrid Ensembles Work.关于混合集成模型为何有效的理论分析。
Comput Intell Neurosci. 2017;2017:1930702. doi: 10.1155/2017/1930702. Epub 2017 Jan 31.
7
Artificial Intelligence Algorithms to Diagnose Glaucoma and Detect Glaucoma Progression: Translation to Clinical Practice.用于诊断青光眼和检测青光眼病情进展的人工智能算法:向临床实践的转化
Transl Vis Sci Technol. 2020 Oct 15;9(2):55. doi: 10.1167/tvst.9.2.55. eCollection 2020 Oct.
8
Generation of virtual patient data for in-silico cardiomyopathies drug development using tree ensembles: a comparative study.使用树集成生成虚拟患者数据用于计算机模拟心肌病药物开发:一项比较研究
Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:5343-5346. doi: 10.1109/EMBC44109.2020.9176567.
9
BgN-Score and BsN-Score: bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes.BgN分数和BsN分数:基于装袋法和提升法的集成神经网络评分函数,用于准确预测蛋白质-配体复合物的结合亲和力。
BMC Bioinformatics. 2015;16 Suppl 4(Suppl 4):S8. doi: 10.1186/1471-2105-16-S4-S8. Epub 2015 Feb 23.
10
Addressing the clinical unmet needs in primary Sjögren's Syndrome through the sharing, harmonization and federated analysis of 21 European cohorts.通过对21个欧洲队列的共享、协调和联合分析来满足原发性干燥综合征临床未满足的需求。
Comput Struct Biotechnol J. 2022 Jan 7;20:471-484. doi: 10.1016/j.csbj.2022.01.002. eCollection 2022.

引用本文的文献

1
Synthetic data generation methods in healthcare: A review on open-source tools and methods.医疗保健领域的合成数据生成方法:关于开源工具和方法的综述
Comput Struct Biotechnol J. 2024 Jul 9;23:2892-2910. doi: 10.1016/j.csbj.2024.07.005. eCollection 2024 Dec.
2
CADUCEO: A Platform to Support Federated Healthcare Facilities through Artificial Intelligence.CADUCEO:一个通过人工智能支持联合医疗保健机构的平台。
Healthcare (Basel). 2023 Aug 4;11(15):2199. doi: 10.3390/healthcare11152199.
3
A practical solution to estimate the sample size required for clinical prediction models generated from observational research on data.
一种实用的方法,用于估计从基于数据的观察性研究中生成的临床预测模型所需的样本量。
Eur Radiol Exp. 2022 Jun 1;6(1):22. doi: 10.1186/s41747-022-00276-y.