• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

HFS-SLPEE:一种用于精准癌症诊断的新型分层特征选择与二次学习概率误差集成模型

HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis.

作者信息

Meng Yajie, Jin Min

机构信息

College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.

出版信息

Front Cell Dev Biol. 2021 Jun 30;9:696359. doi: 10.3389/fcell.2021.696359. eCollection 2021.

DOI:10.3389/fcell.2021.696359
PMID:34277640
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8278475/
Abstract

The emergence of high-throughput RNA-seq data has offered unprecedented opportunities for cancer diagnosis. However, capturing biological data with highly nonlinear and complex associations by most existing approaches for cancer diagnosis has been challenging. In this study, we propose a novel hierarchical feature selection and second learning probability error ensemble model (named HFS-SLPEE) for precision cancer diagnosis. Specifically, we first integrated protein-coding gene expression profiles, non-coding RNA expression profiles, and DNA methylation data to provide rich information; afterward, we designed a novel hierarchical feature selection method, which takes the CpG-gene biological associations into account and can select a compact set of superior features; next, we used four individual classifiers with significant differences and apparent complementary to build the heterogeneous classifiers; lastly, we developed a second learning probability error ensemble model called SLPEE to thoroughly learn the new data consisting of classifiers-predicted class probability values and the actual label, further realizing the self-correction of the diagnosis errors. Benchmarking comparisons on TCGA showed that HFS-SLPEE performs better than the state-of-the-art approaches. Moreover, we analyzed in-depth 10 groups of selected features and found several novel HFS-SLPEE-predicted epigenomics and epigenetics biomarkers for breast invasive carcinoma (BRCA) (e.g., TSLP and ADAMTS9-AS2), lung adenocarcinoma (LUAD) (e.g., HBA1 and CTB-43E15.1), and kidney renal clear cell carcinoma (KIRC) (e.g., IRX2 and BMPR1B-AS1).

摘要

高通量RNA测序数据的出现为癌症诊断提供了前所未有的机遇。然而,大多数现有的癌症诊断方法在捕捉具有高度非线性和复杂关联的生物学数据方面一直面临挑战。在本研究中,我们提出了一种用于精准癌症诊断的新型分层特征选择和二次学习概率误差集成模型(名为HFS-SLPEE)。具体而言,我们首先整合了蛋白质编码基因表达谱、非编码RNA表达谱和DNA甲基化数据,以提供丰富的信息;随后,我们设计了一种新型的分层特征选择方法,该方法考虑了CpG-基因生物学关联,能够选择一组紧凑的优质特征;接下来,我们使用四个具有显著差异和明显互补性的个体分类器来构建异构分类器;最后,我们开发了一种名为SLPEE的二次学习概率误差集成模型,以全面学习由分类器预测的类概率值和实际标签组成的新数据,进一步实现诊断误差的自我校正。在TCGA上的基准比较表明,HFS-SLPEE的性能优于现有最先进的方法。此外,我们深入分析了10组选定的特征,发现了几种用于乳腺浸润性癌(BRCA)(例如,TSLP和ADAMTS9-AS2)、肺腺癌(LUAD)(例如HBA1和CTB-43E15.1)和肾透明细胞癌(KIRC)(例如IRX2和BMPR1B-AS1)的新型HFS-SLPEE预测的表观基因组学和表观遗传学生物标志物。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/766d/8278475/e689bb9a41e4/fcell-09-696359-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/766d/8278475/f1ef581d82ae/fcell-09-696359-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/766d/8278475/4eb9fc2fb1e5/fcell-09-696359-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/766d/8278475/afef57c0babd/fcell-09-696359-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/766d/8278475/c4e1a7ab2477/fcell-09-696359-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/766d/8278475/e689bb9a41e4/fcell-09-696359-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/766d/8278475/f1ef581d82ae/fcell-09-696359-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/766d/8278475/4eb9fc2fb1e5/fcell-09-696359-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/766d/8278475/afef57c0babd/fcell-09-696359-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/766d/8278475/c4e1a7ab2477/fcell-09-696359-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/766d/8278475/e689bb9a41e4/fcell-09-696359-g005.jpg

相似文献

1
HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis.HFS-SLPEE:一种用于精准癌症诊断的新型分层特征选择与二次学习概率误差集成模型
Front Cell Dev Biol. 2021 Jun 30;9:696359. doi: 10.3389/fcell.2021.696359. eCollection 2021.
2
Using epigenomics data to predict gene expression in lung cancer.利用表观基因组学数据预测肺癌中的基因表达。
BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S10. doi: 10.1186/1471-2105-16-S5-S10. Epub 2015 Mar 18.
3
Machine learning integrated ensemble of feature selection methods followed by survival analysis for predicting breast cancer subtype specific miRNA biomarkers.机器学习集成特征选择方法的集合,然后进行生存分析,以预测乳腺癌亚型特异性 miRNA 生物标志物。
Comput Biol Med. 2021 Apr;131:104244. doi: 10.1016/j.compbiomed.2021.104244. Epub 2021 Jan 28.
4
Multi-scale supervised clustering-based feature selection for tumor classification and identification of biomarkers and targets on genomic data.基于多尺度监督聚类的特征选择在肿瘤分类和基因组数据的生物标志物和靶标鉴定中的应用。
BMC Genomics. 2020 Sep 22;21(1):650. doi: 10.1186/s12864-020-07038-3.
5
A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature Selection Method in Software Defect Prediction.一种新颖的基于排序聚合的混合多过滤器包装特征选择方法在软件缺陷预测中。
Comput Intell Neurosci. 2021 Nov 24;2021:5069016. doi: 10.1155/2021/5069016. eCollection 2021.
6
Bioinformatics Analysis and Functional Verification of ADAMTS9-AS1/AS2 in Lung Adenocarcinoma.肺腺癌中ADAMTS9-AS1/AS2的生物信息学分析与功能验证
Front Oncol. 2021 Jul 29;11:681777. doi: 10.3389/fonc.2021.681777. eCollection 2021.
7
A two-stage hybrid biomarker selection method based on ensemble filter and binary differential evolution incorporating binary African vultures optimization.基于集成筛选器和二进制差分进化并结合二进制非洲秃鹫优化的两阶段混合生物标志物选择方法。
BMC Bioinformatics. 2023 Apr 4;24(1):130. doi: 10.1186/s12859-023-05247-7.
8
Identification of genes and pathways involved in kidney renal clear cell carcinoma.肾透明细胞癌相关基因和通路的鉴定
BMC Bioinformatics. 2014;15 Suppl 17(Suppl 17):S2. doi: 10.1186/1471-2105-15-S17-S2. Epub 2014 Dec 16.
9
Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents.第1部分. 多种空气污染成分影响的统计学习方法
Res Rep Health Eff Inst. 2015 Jun(183 Pt 1-2):5-50.
10
Methylome and transcriptome analyses reveal insights into the epigenetic basis for the good survival of hypomethylated ER-positive breast cancer subtype.甲基化组和转录组分析揭示了去甲基化 ER 阳性乳腺癌亚型良好生存的表观遗传基础。
Clin Epigenetics. 2020 Jan 20;12(1):16. doi: 10.1186/s13148-020-0811-1.

引用本文的文献

1
Data analysis methods for defining biomarkers from omics data.用于从组学数据中定义生物标志物的数据分析方法。
Anal Bioanal Chem. 2022 Jan;414(1):235-250. doi: 10.1007/s00216-021-03813-7. Epub 2021 Dec 24.
2
Lung Cancer Computational Biology and Resources.肺癌计算生物学和资源。
Cold Spring Harb Perspect Med. 2022 Feb 1;12(2):a038273. doi: 10.1101/cshperspect.a038273.

本文引用的文献

1
A Deep Learning Framework to Predict Tumor Tissue-of-Origin Based on Copy Number Alteration.一种基于拷贝数改变预测肿瘤组织起源的深度学习框架。
Front Bioeng Biotechnol. 2020 Aug 5;8:701. doi: 10.3389/fbioe.2020.00701. eCollection 2020.
2
Identifying Potential miRNAs-Disease Associations With Probability Matrix Factorization.利用概率矩阵分解识别潜在的微小RNA与疾病的关联
Front Genet. 2019 Dec 11;10:1234. doi: 10.3389/fgene.2019.01234. eCollection 2019.
3
Identifying lncRNA and mRNA Co-Expression Modules from Matched Expression Data in Ovarian Cancer.
从卵巢癌匹配表达数据中鉴定 lncRNA 和 mRNA 共表达模块。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Mar-Apr;17(2):623-634. doi: 10.1109/TCBB.2018.2864129. Epub 2018 Aug 7.
4
Construction of a specific SVM classifier and identification of molecular markers for lung adenocarcinoma based on lncRNA-miRNA-mRNA network.基于lncRNA-miRNA-mRNA网络构建肺腺癌特异性支持向量机分类器并鉴定分子标志物
Onco Targets Ther. 2018 May 25;11:3129-3140. doi: 10.2147/OTT.S151121. eCollection 2018.
5
lncRNA Gene Signatures for Prediction of Breast Cancer Intrinsic Subtypes and Prognosis.用于预测乳腺癌内在亚型和预后的长链非编码RNA基因特征
Genes (Basel). 2018 Jan 26;9(2):65. doi: 10.3390/genes9020065.
6
Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification.基于信息增益和支持向量机的混合方法在癌症分类基因选择中的应用
Genomics Proteomics Bioinformatics. 2017 Dec;15(6):389-395. doi: 10.1016/j.gpb.2017.08.002. Epub 2017 Dec 12.
7
A deep learning-based multi-model ensemble method for cancer prediction.基于深度学习的癌症预测多模型集成方法。
Comput Methods Programs Biomed. 2018 Jan;153:1-9. doi: 10.1016/j.cmpb.2017.09.005. Epub 2017 Sep 14.
8
Tumor origin detection with tissue-specific miRNA and DNA methylation markers.利用组织特异性 miRNA 和 DNA 甲基化标记物进行肿瘤起源检测。
Bioinformatics. 2018 Feb 1;34(3):398-406. doi: 10.1093/bioinformatics/btx622.
9
A filter feature selection method based on the Maximal Information Coefficient and Gram-Schmidt Orthogonalization for biomedical data mining.基于最大信息系数和 Gram-Schmidt 正交化的生物医学数据挖掘过滤特征选择方法。
Comput Biol Med. 2017 Oct 1;89:264-274. doi: 10.1016/j.compbiomed.2017.08.021. Epub 2017 Aug 24.
10
A novel approach to identify the miRNA-mRNA causal regulatory modules in Cancer.一种识别癌症中miRNA-mRNA因果调控模块的新方法。
IEEE/ACM Trans Comput Biol Bioinform. 2018 Jan-Feb;15(1):309-315. doi: 10.1109/TCBB.2016.2612199. Epub 2016 Sep 21.