Suppr超能文献

免疫特征分析信息特征选择技术。

Technology of Informative Feature Selection for Immunosignature Analysis.

机构信息

Assistant, Department of Theoretical Foundations of Informatics; National Research Tomsk State University, 36 Lenin Avenue, Tomsk, 634050, Russia.

Associate Professor, Department of Theoretical Foundations of Informatics; National Research Tomsk State University, 36 Lenin Avenue, Tomsk, 634050, Russia; Leading Engineer, Institute of Applied Mathematics and Computer Science; National Research Tomsk State University, 36 Lenin Avenue, Tomsk, 634050, Russia.

出版信息

Sovrem Tekhnologii Med. 2021;12(5):19-25. doi: 10.17691/stm2020.12.5.02. Epub 2020 Oct 28.

Abstract

UNLABELLED

The main difficulty in practical work with data obtained via immunosignature analysis is high dimensionality and the presence of a significant number of uninformative or false-informative features due to the specific character of the technology. To ensure practically relevant quality of data analysis and classification, it is necessary to take due account of this specific character. is to create and test the technology for effective reduction of immunosignature data dimensionality, which provides practically relevant and high quality of classification with due regard for the properties of the data obtained.

MATERIALS AND METHODS

The study involved the use of two normalized data sets obtained from the public biomedical repository and containing the results of immunosignature analysis.The technology for selecting informative features was proposed within the framework of the study. It consisted of three successive steps: 1) breaking a multiclass task into a series of binary tasks using the "one vs all" strategy; 2) screening of false-informative features is performed for each binary comparison by comparing the values of the median of the sets "one" and "all"; 3) ranking of the remaining features according to their informative value and selection of the most informative ones for each binary comparison.To assess the quality of the proposed technology for informative feature selection, we used the results obtained after application of classification based on the filtered data. Support vector method that proved itself in the problems of high-dimensional data classification was used as a classification model.

RESULTS

Effectiveness of the proposed technology for informative feature selection was determined. This technology allows us to provide high quality of classification while significantly reducing the feature space. The number of features eliminated in the second step is approximately 50% for each data set under consideration, which greatly simplifies subsequent data analysis. After the third step, when the feature space is reduced to 15 features, the quality of classification by the macro-average F1-score metric is assessed as 98.9% for the GSE52581 dataset. For the GSE52581 dataset, with the feature space reduced to 266 features, the quality of classification by the macro-average F1-score metric is 91.3%.

CONCLUSION

The results of the work demonstrate the promising outlook of the proposed technology for informative feature selection as applied to the data of immunosignature analysis.

摘要

未加标签

在实际应用免疫特征分析获得的数据时,主要的困难是由于技术的特殊性,数据具有高维度和存在大量无信息或虚假信息特征。为了确保数据分析和分类的实际相关质量,有必要充分考虑到这一特点。本研究旨在创建和测试有效的免疫特征数据分析降维技术,该技术在充分考虑所获得数据特性的情况下,提供实际相关的和高质量的分类。

材料与方法

本研究使用了两个来自公共生物医学存储库的归一化数据集,其中包含免疫特征分析的结果。该研究提出了选择信息特征的技术。它由三个连续的步骤组成:1)使用“一对多”策略将多类任务分解为一系列二元任务;2)通过比较“一”和“全”两个集合的中位数值,对每个二元比较进行虚假信息特征筛选;3)根据信息值对剩余特征进行排序,并为每个二元比较选择最具信息的特征。为了评估所提出的信息特征选择技术的质量,我们使用了在过滤后的数据上应用分类后获得的结果。支持向量方法在高维数据分类问题中得到了验证,被用作分类模型。

结果

确定了所提出的信息特征选择技术的有效性。该技术允许我们在显著降低特征空间的同时提供高质量的分类。对于每个所考虑的数据集中,在第二步中消除的特征数量约为 50%,这大大简化了后续的数据分析。在第三步中,当特征空间减少到 15 个特征时,通过宏平均 F1 评分指标评估 GSE52581 数据集的分类质量为 98.9%。对于 GSE52581 数据集,当特征空间减少到 266 个特征时,通过宏平均 F1 评分指标评估的分类质量为 91.3%。

结论

该工作的结果表明,所提出的信息特征选择技术在应用于免疫特征分析数据时具有广阔的前景。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2423/8596259/ea8fb6c41f5d/STM-12-5-02-f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验