Suppr超能文献

基于鲁棒模糊规则的 TCGA 基因表达数据综合特征选择策略。

A robust fuzzy rule based integrative feature selection strategy for gene expression data in TCGA.

机构信息

School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, Sichuan, China.

Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 611731, Sichuan, China.

出版信息

BMC Med Genomics. 2019 Jan 31;12(Suppl 1):14. doi: 10.1186/s12920-018-0451-x.

Abstract

BACKGROUND

Lots of researches have been conducted in the selection of gene signatures that could distinguish the cancer patients from the normal. However, it is still an open question on how to extract the robust gene features.

METHODS

In this work, a gene signature selection strategy for TCGA data was proposed by integrating the gene expression data, the methylation data and the prior knowledge about cancer biomarkers. Different from the traditional integration method, the expanded 450 K methylation data were applied instead of the original 450 K array data, and the reported biomarkers were weighted in the feature selection. Fuzzy rule based classification method and cross validation strategy were applied in the model construction for performance evaluation.

RESULTS

Our selected gene features showed prediction accuracy close to 100% in the cross validation with fuzzy rule based classification model on 6 cancers from TCGA. The cross validation performance of our proposed model is similar to other integrative models or RNA-seq only model, while the prediction performance on independent data is obviously better than other 5 models. The gene signatures extracted with our fuzzy rule based integrative feature selection strategy were more robust, and had the potential to get better prediction results.

CONCLUSION

The results indicated that the integration of expanded methylation data would cover more genes, and had greater capacity to retrieve the signature genes compared with the original 450 K methylation data. Also, the integration of the reported biomarkers was a promising way to improve the performance. PTCHD3 gene was selected as a discriminating gene in 3 out of the 6 cancers, which suggested that it might play important role in the cancer risk and would be worthy for the intensive investigation.

摘要

背景

许多研究都致力于选择能够区分癌症患者和正常人的基因特征。然而,如何提取稳健的基因特征仍然是一个悬而未决的问题。

方法

在这项工作中,我们提出了一种基于 TCGA 数据的基因特征选择策略,该策略整合了基因表达数据、甲基化数据和关于癌症生物标志物的先验知识。与传统的集成方法不同,我们应用了扩展的 450K 甲基化数据,而不是原始的 450K 阵列数据,并在特征选择中对报道的生物标志物进行了加权。模糊规则分类方法和交叉验证策略被应用于模型构建和性能评估。

结果

我们选择的基因特征在基于模糊规则分类模型的 6 种 TCGA 癌症的交叉验证中表现出接近 100%的预测准确性。与其他集成模型或 RNA-seq 仅模型相比,我们提出的模型的交叉验证性能相似,但在独立数据上的预测性能明显优于其他 5 个模型。我们基于模糊规则的集成特征选择策略提取的基因特征更稳健,有潜力获得更好的预测结果。

结论

结果表明,与原始的 450K 甲基化数据相比,扩展的甲基化数据的集成可以涵盖更多的基因,并有更大的能力检索特征基因。此外,报告的生物标志物的集成是提高性能的一种有前途的方法。PTCHD3 基因在 6 种癌症中的 3 种中被选为区分基因,这表明它可能在癌症风险中发挥重要作用,值得深入研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f95/6357346/2b9fd8939b01/12920_2018_451_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验