• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ClearF:一种使用类内嵌入和重建的监督特征评分方法,用于寻找生物标志物。

ClearF: a supervised feature scoring method to find biomarkers using class-wise embedding and reconstruction.

机构信息

Department of Computer Engineering, Ajou University, Suwon, 16499, South Korea.

Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, 77030, USA.

出版信息

BMC Med Genomics. 2019 Jul 11;12(Suppl 5):95. doi: 10.1186/s12920-019-0512-9.

DOI:10.1186/s12920-019-0512-9
PMID:31296201
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6624178/
Abstract

BACKGROUND

Feature selection or scoring methods for the detection of biomarkers are essential in bioinformatics. Various feature selection methods have been developed for the detection of biomarkers, and several studies have employed information-theoretic approaches. However, most of these methods generally require a long processing time. In addition, information-theoretic methods discretize continuous features, which is a drawback that can lead to the loss of information.

RESULTS

In this paper, a novel supervised feature scoring method named ClearF is proposed. The proposed method is suitable for continuous-valued data, which is similar to the principle of feature selection using mutual information, with the added advantage of a reduced computation time. The proposed score calculation is motivated by the association between the reconstruction error and the information-theoretic measurement. Our method is based on class-wise low-dimensional embedding and the resulting reconstruction error. Given multi-class datasets such as a case-control study dataset, low-dimensional embedding is first applied to each class to obtain a compressed representation of the class, and also for the entire dataset. Reconstruction is then performed to calculate the error of each feature and the final score for each feature is defined in terms of the reconstruction errors. The correlation between the information theoretic measurement and the proposed method is demonstrated using a simulation. For performance validation, we compared the classification performance of the proposed method with those of various algorithms on benchmark datasets.

CONCLUSIONS

The proposed method showed higher accuracy and lower execution time than the other established methods. Moreover, an experiment was conducted on the TCGA breast cancer dataset, and it was confirmed that the genes with the highest scores were highly associated with subtypes of breast cancer.

摘要

背景

在生物信息学中,用于检测生物标志物的特征选择或评分方法是必不可少的。已经开发了各种特征选择方法来检测生物标志物,并且已经有几项研究采用了信息论方法。但是,这些方法通常需要很长的处理时间。此外,信息论方法将连续特征离散化,这是一个缺点,可能导致信息丢失。

结果

本文提出了一种名为 ClearF 的新的有监督特征评分方法。该方法适用于连续值数据,类似于使用互信息进行特征选择的原理,具有减少计算时间的优点。提出的评分计算方法的动机是重建误差与信息论度量之间的关联。我们的方法基于类别的低维嵌入和由此产生的重建误差。对于多类数据集(例如病例对照研究数据集),首先将低维嵌入应用于每个类,以获得类的压缩表示,以及整个数据集的压缩表示。然后进行重建以计算每个特征的误差,并且根据重建误差定义每个特征的最终得分。使用仿真演示了信息论度量与所提出的方法之间的相关性。为了验证性能,我们将所提出的方法与基准数据集上的各种算法的分类性能进行了比较。

结论

与其他已建立的方法相比,该方法的准确性更高,执行时间更短。此外,还在 TCGA 乳腺癌数据集上进行了实验,证实得分最高的基因与乳腺癌亚型高度相关。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/b5b0c984f07e/12920_2019_512_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/d9df0a6c9dfb/12920_2019_512_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/0429eeec7ab5/12920_2019_512_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/52f245c01c81/12920_2019_512_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/75e61db11fa7/12920_2019_512_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/dfbbca3944d1/12920_2019_512_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/829ec6975602/12920_2019_512_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/75d6a9019d60/12920_2019_512_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/77ed5d67bd12/12920_2019_512_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/fa4b9fb926c2/12920_2019_512_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/289a65955471/12920_2019_512_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/b5b0c984f07e/12920_2019_512_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/d9df0a6c9dfb/12920_2019_512_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/0429eeec7ab5/12920_2019_512_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/52f245c01c81/12920_2019_512_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/75e61db11fa7/12920_2019_512_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/dfbbca3944d1/12920_2019_512_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/829ec6975602/12920_2019_512_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/75d6a9019d60/12920_2019_512_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/77ed5d67bd12/12920_2019_512_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/fa4b9fb926c2/12920_2019_512_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/289a65955471/12920_2019_512_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f3/6624178/b5b0c984f07e/12920_2019_512_Fig11_HTML.jpg

相似文献

1
ClearF: a supervised feature scoring method to find biomarkers using class-wise embedding and reconstruction.ClearF:一种使用类内嵌入和重建的监督特征评分方法,用于寻找生物标志物。
BMC Med Genomics. 2019 Jul 11;12(Suppl 5):95. doi: 10.1186/s12920-019-0512-9.
2
ClearF++: Improved Supervised Feature Scoring Using Feature Clustering in Class-Wise Embedding and Reconstruction.ClearF++:在类内嵌入和重构中使用特征聚类改进监督特征评分
Bioengineering (Basel). 2023 Jul 10;10(7):824. doi: 10.3390/bioengineering10070824.
3
Voxel-wise adversarial semi-supervised learning for medical image segmentation.用于医学图像分割的体素级对抗半监督学习。
Comput Biol Med. 2022 Nov;150:106152. doi: 10.1016/j.compbiomed.2022.106152. Epub 2022 Sep 29.
4
Neurodynamics-driven holistic approaches to semi-supervised feature selection.基于神经动力学的半监督特征选择整体方法。
Neural Netw. 2023 Jan;157:377-386. doi: 10.1016/j.neunet.2022.10.029. Epub 2022 Nov 3.
5
Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.将机器学习中的手工特征与潜在变量相结合,以预测放射性肺损伤。
Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.
6
Learning a discriminant graph-based embedding with feature selection for image categorization.基于判别图嵌入和特征选择的图像分类方法研究。
Neural Netw. 2019 Mar;111:35-46. doi: 10.1016/j.neunet.2018.12.008. Epub 2018 Dec 27.
7
Identification of potential biomarkers on microarray data using distributed gene selection approach.基于分布式基因选择方法的芯片数据中潜在生物标志物的识别。
Math Biosci. 2019 Sep;315:108230. doi: 10.1016/j.mbs.2019.108230. Epub 2019 Jul 18.
8
Enhancing the prediction of IDC breast cancer staging from gene expression profiles using hybrid feature selection methods and deep learning architecture.使用混合特征选择方法和深度学习架构增强从基因表达谱预测浸润性导管癌乳腺癌分期的能力。
Med Biol Eng Comput. 2023 Nov;61(11):2895-2919. doi: 10.1007/s11517-023-02892-1. Epub 2023 Aug 2.
9
Masked hypergraph learning for weakly supervised histopathology whole slide image classification.基于掩蔽超图学习的弱监督病理切片图像分类。
Comput Methods Programs Biomed. 2024 Aug;253:108237. doi: 10.1016/j.cmpb.2024.108237. Epub 2024 May 23.
10
Comparison of five supervised feature selection algorithms leading to top features and gene signatures from multi-omics data in cancer.比较五种监督特征选择算法,这些算法可从癌症的多组学数据中得到顶级特征和基因特征。
BMC Bioinformatics. 2022 Apr 28;23(Suppl 3):153. doi: 10.1186/s12859-022-04678-y.

引用本文的文献

1
ClearF++: Improved Supervised Feature Scoring Using Feature Clustering in Class-Wise Embedding and Reconstruction.ClearF++:在类内嵌入和重构中使用特征聚类改进监督特征评分
Bioengineering (Basel). 2023 Jul 10;10(7):824. doi: 10.3390/bioengineering10070824.
2
A feature selection-based framework to identify biomarkers for cancer diagnosis: A focus on lung adenocarcinoma.基于特征选择的癌症诊断生物标志物识别框架:以肺腺癌为例。
PLoS One. 2022 Sep 6;17(9):e0269126. doi: 10.1371/journal.pone.0269126. eCollection 2022.

本文引用的文献

1
Benchmarking relief-based feature selection methods for bioinformatics data mining.基于基准的特征选择方法在生物信息学数据挖掘中的应用。
J Biomed Inform. 2018 Sep;85:168-188. doi: 10.1016/j.jbi.2018.07.015. Epub 2018 Jul 17.
2
Distinct expression of CDCA3, CDCA5, and CDCA8 leads to shorter relapse free survival in breast cancer patient.CDCA3、CDCA5和CDCA8的不同表达导致乳腺癌患者无复发生存期缩短。
Oncotarget. 2018 Jan 9;9(6):6977-6992. doi: 10.18632/oncotarget.24059. eCollection 2018 Jan 23.
3
Integrative information theoretic network analysis for genome-wide association study of aspirin exacerbated respiratory disease in Korean population.
韩国人群中阿司匹林加重呼吸道疾病全基因组关联研究的整合信息理论网络分析
BMC Med Genomics. 2017 May 24;10(Suppl 1):31. doi: 10.1186/s12920-017-0266-1.
4
Hypermethylation of CDKN2A exon 2 in tumor, tumor-adjacent and tumor-distant tissues from breast cancer patients.乳腺癌患者肿瘤组织、癌旁组织及癌远端组织中CDKN2A外显子2的高甲基化
BMC Cancer. 2017 Apr 12;17(1):260. doi: 10.1186/s12885-017-3244-2.
5
Exploring the intrinsic differences among breast tumor subtypes defined using immunohistochemistry markers based on the decision tree.基于决策树的免疫组织化学标志物定义的乳腺肿瘤亚型间内在差异的探索。
Sci Rep. 2016 Oct 27;6:35773. doi: 10.1038/srep35773.
6
FOXC1 identifies basal-like breast cancer in a hereditary breast cancer cohort.FOXC1可在遗传性乳腺癌队列中识别出基底样乳腺癌。
Oncotarget. 2016 Nov 15;7(46):75729-75738. doi: 10.18632/oncotarget.12370.
7
Machine Learning methods for Quantitative Radiomic Biomarkers.用于定量放射组学生物标志物的机器学习方法。
Sci Rep. 2015 Aug 17;5:13087. doi: 10.1038/srep13087.
8
A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data.应用于微阵列数据的特征选择与特征提取方法综述
Adv Bioinformatics. 2015;2015:198363. doi: 10.1155/2015/198363. Epub 2015 Jun 11.
9
Integrative network analysis for survival-associated gene-gene interactions across multiple genomic profiles in ovarian cancer.卵巢癌中跨多个基因组图谱的生存相关基因-基因相互作用的整合网络分析。
J Ovarian Res. 2015 Jul 3;8:42. doi: 10.1186/s13048-015-0171-1.
10
Overexpression of IGF2 mRNA-Binding Protein 2 (IMP2/p62) as a Feature of Basal-like Breast Cancer Correlates with Short Survival.胰岛素样生长因子2信使核糖核酸结合蛋白2(IMP2/p62)过表达作为基底样乳腺癌的一个特征与生存期短相关。
Scand J Immunol. 2015 Aug;82(2):142-3. doi: 10.1111/sji.12307.