Pezoulas Vasileios C, Kourou Konstantina D, Kalatzis Fanis, Exarchos Themis P, Zampeli Evi, Gandolfo Saviana, Goules Andreas, Baldini Chiara, Skopouli Fotini, De Vita Salvatore, Tzioufas Athanasios G, Fotiadis Dimitrios I
Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and EngineeringUniversity of Ioannina GR45110 Ioannina Greece.
Department of Biological Applications and TechnologyUniversity of Ioannina GR45110 Ioannina Greece.
IEEE Open J Eng Med Biol. 2020 Mar 16;1:83-90. doi: 10.1109/OJEMB.2020.2981258. eCollection 2020.
To present a framework for data sharing, curation, harmonization and federated data analytics to solve open issues in healthcare, such as, the development of robust disease prediction models. Data curation is applied to remove data inconsistencies. Lexical and semantic matching methods are used to align the structure of the heterogeneous, curated cohort data along with incremental learning algorithms including class imbalance handling and hyperparameter optimization to enable the development of disease prediction models. The applicability of the framework is demonstrated in a case study of primary Sjögren's Syndrome, yielding harmonized data with increased quality and more than 85% agreement, along with lymphoma prediction models with more than 80% sensitivity and specificity. The framework provides data quality, harmonization and analytics workflows that can enhance the statistical power of heterogeneous clinical data and enables the development of robust models for disease prediction.
提出一个用于数据共享、管理、协调和联邦数据分析的框架,以解决医疗保健中的开放性问题,例如开发强大的疾病预测模型。数据管理用于消除数据不一致性。词汇和语义匹配方法用于对齐异构的、经过管理的队列数据的结构,同时使用包括类不平衡处理和超参数优化在内的增量学习算法,以促进疾病预测模型的开发。该框架的适用性在原发性干燥综合征的案例研究中得到了证明,产生了质量更高且一致性超过85%的协调数据,以及灵敏度和特异性超过80%的淋巴瘤预测模型。该框架提供了数据质量、协调和分析工作流程,可增强异构临床数据的统计能力,并有助于开发强大的疾病预测模型。