Suppr超能文献

数据集成——以甲状腺癌诊断为例的分子与临床数据融合的可能性。

Data Integration-Possibilities of Molecular and Clinical Data Fusion on the Example of Thyroid Cancer Diagnostics.

机构信息

Department of Systems Biology and Engineering, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland.

Department of Technology Development, Gabos Software Sp z o.o., Mikołowska 100, 40-065 Katowice, Poland.

出版信息

Int J Mol Sci. 2022 Oct 6;23(19):11880. doi: 10.3390/ijms231911880.

Abstract

(1) Background: The data from independent gene expression sources may be integrated for the purpose of molecular diagnostics of cancer. So far, multiple approaches were described. Here, we investigated the impacts of different data fusion strategies on classification accuracy and feature selection stability, which allow the costs of diagnostic tests to be reduced. (2) Methods: We used molecular features (gene expression) combined with a feature extracted from the independent clinical data describing a patient's sample. We considered the dependencies between selected features in two data fusion strategies (early fusion and late fusion) compared to classification models based on molecular features only. We compared the best accuracy classification models in terms of the number of features, which is connected to the potential cost reduction of the diagnostic classifier. (3) Results: We show that for thyroid cancer, the extracted clinical feature is correlated with (but not redundant to) the molecular data. The usage of data fusion allows a model to be obtained with similar or even higher classification quality (with a statistically significant accuracy improvement, a -value below 0.05) and with a reduction in molecular dimensionality of the feature space from 15 to 3-8 (depending on the feature selection method). (4) Conclusions: Both strategies give comparable quality results, but the early fusion method provides better feature selection stability.

摘要

(1) 背景:为了进行癌症的分子诊断,可能需要整合来自独立基因表达源的数据。到目前为止,已经描述了多种方法。在这里,我们研究了不同的数据融合策略对分类准确性和特征选择稳定性的影响,这可以降低诊断测试的成本。

(2) 方法:我们使用了分子特征(基因表达)与从独立的临床数据中提取的特征相结合,该特征描述了患者样本的情况。我们考虑了两种数据融合策略(早期融合和晚期融合)中所选特征之间的依赖关系,以及仅基于分子特征的分类模型。我们根据特征的数量比较了最佳准确性分类模型,这与诊断分类器的潜在成本降低有关。

(3) 结果:我们表明,对于甲状腺癌,提取的临床特征与分子数据相关(但不是冗余的)。使用数据融合可以获得具有相似甚至更高分类质量的模型(具有统计学意义的准确性提高,a 值低于 0.05),并且特征空间的分子维度从 15 减少到 3-8(取决于特征选择方法)。

(4) 结论:两种策略都能得到可比的质量结果,但早期融合方法提供了更好的特征选择稳定性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7175/9569592/f34264005be0/ijms-23-11880-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验