一种基于过滤的特征选择方法，用于识别肺癌的潜在生物标志物。

A filter-based feature selection approach for identifying potential biomarkers for lung cancer.

作者信息

Lee In-Hee, Lushington Gerald H, Visvanathan Mahesh

机构信息

Bioinformatics Core Facility, University of Kansas, Lawrence, KS 66046, USA.

出版信息

J Clin Bioinforma. 2011 Mar 21;1(1):11. doi: 10.1186/2043-9113-1-11.

DOI:10.1186/2043-9113-1-11

PMID:21884628

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3164604/

Abstract

BACKGROUND

Lung cancer is the leading cause of death from cancer in the world and its treatment is dependant on the type and stage of cancer detected in the patient. Molecular biomarkers that can characterize the cancer phenotype are thus a key tool in planning a therapeutic response. A common protocol for identifying such biomarkers is to employ genomic microarray analysis to find genes that show differential expression according to disease state or type. Data-mining techniques such as feature selection are often used to isolate, from among a large manifold of genes with differential expression, those specific genes whose differential expression patterns are of optimal value in phenotypic differentiation. One such technique, Biomarker Identifier (BMI), has been developed to identify features with the ability to distinguish between two data groups of interest, which is thus highly applicable for such studies.

RESULTS

Microarray data with validated genes was used to evaluate the utility of BMI in identifying markers for lung cancer. This data set contains a set of 129 gene expression profiles from large-airway epithelial cells (60 samples from smokers with lung cancer and 69 from smokers without lung cancer) and 7 genes from this data have been confirmed to be differentially expressed by quantitative PCR. Using this data set, BMI was compared with various well-known feature selection methods and was found to be more successful than other methods in finding useful genes to classify cancerous samples. Also it is evident that genes selected by BMI (given the same number of genes and classification algorithms) showed better discriminative power than those from the original study. After pathway analysis on the selected genes by BMI, we have been able to correlate the selected genes with well-known cancer-related pathways.

CONCLUSIONS

Our results show that BMI can be used to analyze microarray data and to find useful genes for classifying samples. Pathway analysis suggests that BMI is successful in identifying biomarker-quality cancer-related genes from the data.

摘要

背景

肺癌是全球癌症死亡的主要原因，其治疗取决于在患者中检测到的癌症类型和阶段。因此，能够表征癌症表型的分子生物标志物是规划治疗反应的关键工具。识别此类生物标志物的常用方案是采用基因组微阵列分析来寻找根据疾病状态或类型显示差异表达的基因。诸如特征选择等数据挖掘技术通常用于从大量差异表达的基因中分离出那些差异表达模式在表型分化中具有最佳价值的特定基因。一种这样的技术，即生物标志物标识符（BMI），已被开发用于识别具有区分两个感兴趣数据组能力的特征，因此非常适用于此类研究。

结果

使用具有经过验证基因的微阵列数据来评估BMI在识别肺癌标志物方面的效用。该数据集包含一组来自大气道上皮细胞的129个基因表达谱（60个来自肺癌吸烟者的样本和69个来自无肺癌吸烟者的样本），并且该数据中的7个基因已通过定量PCR证实存在差异表达。使用该数据集，将BMI与各种知名的特征选择方法进行比较，发现BMI在找到用于对癌性样本进行分类的有用基因方面比其他方法更成功。同样明显的是，由BMI选择的基因（在相同数量的基因和分类算法的情况下）比原始研究中的基因表现出更好的判别能力。对BMI选择的基因进行通路分析后，我们能够将选择的基因与知名的癌症相关通路相关联。

结论

我们的结果表明，BMI可用于分析微阵列数据并找到用于对样本进行分类的有用基因。通路分析表明，BMI成功地从数据中识别出具有生物标志物质量的癌症相关基因。

相似文献

A filter-based feature selection approach for identifying potential biomarkers for lung cancer.一种基于过滤的特征选择方法，用于识别肺癌的潜在生物标志物。

J Clin Bioinforma. 2011 Mar 21;1(1):11. doi: 10.1186/2043-9113-1-11.

A sequence-based approach to identify reference genes for gene expression analysis.基于序列的方法用于鉴定基因表达分析的参考基因。

BMC Med Genomics. 2010 Aug 3;3:32. doi: 10.1186/1755-8794-3-32.

Characterizing the impact of smoking and lung cancer on the airway transcriptome using RNA-Seq.利用 RNA-Seq 技术描绘吸烟和肺癌对气道转录组的影响。

Cancer Prev Res (Phila). 2011 Jun;4(6):803-17. doi: 10.1158/1940-6207.CAPR-11-0212.

Finding genes discriminating smokers from non-smokers by applying a growing self-organizing clustering method to large airway epithelium cell microarray data.通过将一种不断发展的自组织聚类方法应用于大气道上皮细胞微阵列数据，寻找区分吸烟者和非吸烟者的基因。

Asian Pac J Cancer Prev. 2013;14(1):111-6. doi: 10.7314/apjcp.2013.14.1.111.

GSNFS: Gene subnetwork biomarker identification of lung cancer expression data.GSNFS：肺癌表达数据的基因子网生物标志物识别

BMC Med Genomics. 2016 Dec 5;9(Suppl 3):70. doi: 10.1186/s12920-016-0231-4.

Identifying Cancer Biomarkers From Microarray Data Using Feature Selection and Semisupervised Learning.基于特征选择和半监督学习的基因表达谱数据癌症生物标志物识别

IEEE J Transl Eng Health Med. 2014 Dec 2;2:4300211. doi: 10.1109/JTEHM.2014.2375820. eCollection 2014.

A novel class dependent feature selection method for cancer biomarker discovery.一种新的基于类别相关特征选择的癌症生物标志物发现方法。

Comput Biol Med. 2014 Apr;47:66-75. doi: 10.1016/j.compbiomed.2014.01.014. Epub 2014 Feb 6.

GEOlimma: differential expression analysis and feature selection using pre-existing microarray data.GEOlimma：利用已有微阵列数据进行差异表达分析和特征选择

BMC Bioinformatics. 2021 Feb 3;22(1):44. doi: 10.1186/s12859-020-03932-5.

A Computational Approach to Identification of Candidate Biomarkers in High-Dimensional Molecular Data.一种用于在高维分子数据中识别候选生物标志物的计算方法。

Diagnostics (Basel). 2022 Aug 18;12(8):1997. doi: 10.3390/diagnostics12081997.

Analysis of gene expression profiles of lung cancer subtypes with machine learning algorithms.基于机器学习算法的肺癌亚型基因表达谱分析。

Biochim Biophys Acta Mol Basis Dis. 2020 Aug 1;1866(8):165822. doi: 10.1016/j.bbadis.2020.165822. Epub 2020 Apr 28.

引用本文的文献

Identifying the Relative Importance of Factors Influencing Medication Compliance in General Patients Using Regularized Logistic Regression and LightGBM: Web-Based Survey Analysis.使用正则化逻辑回归和LightGBM确定影响普通患者药物依从性的因素的相对重要性：基于网络的调查分析

JMIR Form Res. 2024 Dec 23;8:e65882. doi: 10.2196/65882.

ClearF++: Improved Supervised Feature Scoring Using Feature Clustering in Class-Wise Embedding and Reconstruction.ClearF++：在类内嵌入和重构中使用特征聚类改进监督特征评分

Bioengineering (Basel). 2023 Jul 10;10(7):824. doi: 10.3390/bioengineering10070824.

A New Algorithm for Cancer Biomarker Gene Detection Using Harris Hawks Optimization.基于哈里斯鹰优化算法的癌症生物标志物基因检测新算法

Sensors (Basel). 2022 Sep 26;22(19):7273. doi: 10.3390/s22197273.

A Novel Biomarker Identification Approach for Gastric Cancer Using Gene Expression and DNA Methylation Dataset.一种利用基因表达和DNA甲基化数据集鉴定胃癌新型生物标志物的方法。

Front Genet. 2021 Mar 25;12:644378. doi: 10.3389/fgene.2021.644378. eCollection 2021.

ClearF: a supervised feature scoring method to find biomarkers using class-wise embedding and reconstruction.ClearF：一种使用类内嵌入和重建的监督特征评分方法，用于寻找生物标志物。

BMC Med Genomics. 2019 Jul 11;12(Suppl 5):95. doi: 10.1186/s12920-019-0512-9.

multiClust: An R-package for Identifying Biologically Relevant Clusters in Cancer Transcriptome Profiles.multiClust：一个用于识别癌症转录组图谱中生物学相关簇的R包。

Cancer Inform. 2016 Jun 12;15:103-14. doi: 10.4137/CIN.S38000. eCollection 2016.

A comparative analysis of swarm intelligence techniques for feature selection in cancer classification.用于癌症分类中特征选择的群体智能技术的比较分析。

ScientificWorldJournal. 2014;2014:693831. doi: 10.1155/2014/693831. Epub 2014 Aug 3.

A graph-theoretic approach for identifying non-redundant and relevant gene markers from microarray data using multiobjective binary PSO.一种使用多目标二进制粒子群优化算法从微阵列数据中识别非冗余和相关基因标记的图论方法。

PLoS One. 2014 Mar 13;9(3):e90949. doi: 10.1371/journal.pone.0090949. eCollection 2014.

Translational medicine as a permanent glue and force of clinical medicine and public health: perspectives (1) from 2012 Sino-American symposium on clinical and translational medicine.转化医学作为临床医学和公共卫生的永久黏合剂和推动力：观点（1）来自 2012 年中美临床与转化医学研讨会。

Clin Transl Med. 2012 Oct 5;1(1):21. doi: 10.1186/2001-1326-1-21.

Identifying dysregulated pathways in cancers from pathway interaction networks.从通路相互作用网络中鉴定癌症中的失调通路。

BMC Bioinformatics. 2012 Jun 7;13:126. doi: 10.1186/1471-2105-13-126.

本文引用的文献

Oncogenes and pathway identification using filter-based approaches between various carcinoma types in lung.使用基于过滤的方法在肺癌的各种癌型之间进行癌基因和信号通路鉴定。

Int J Comput Biol Drug Des. 2009;2(3):236-51. doi: 10.1504/IJCBDD.2009.030115. Epub 2009 Dec 10.

EGAN: exploratory gene association networks.EGAN：探索性基因关联网络。

Bioinformatics. 2010 Jan 15;26(2):285-6. doi: 10.1093/bioinformatics/btp656. Epub 2009 Nov 23.

KEGG for representation and analysis of molecular networks involving diseases and drugs.KEGG 用于表示和分析涉及疾病和药物的分子网络。

Nucleic Acids Res. 2010 Jan;38(Database issue):D355-60. doi: 10.1093/nar/gkp896. Epub 2009 Oct 30.

A new ensemble-based algorithm for identifying breath gas marker candidates in liver disease using ion molecule reaction mass spectrometry.一种基于集成的新算法，用于使用离子分子反应质谱法识别肝病中的呼吸气体标志物候选物。

Bioinformatics. 2009 Apr 1;25(7):941-7. doi: 10.1093/bioinformatics/btp093. Epub 2009 Feb 17.

PID: the Pathway Interaction Database.PID：通路相互作用数据库。

Nucleic Acids Res. 2009 Jan;37(Database issue):D674-9. doi: 10.1093/nar/gkn653. Epub 2008 Oct 2.

Lung cancer.肺癌

N Engl J Med. 2008 Sep 25;359(13):1367-80. doi: 10.1056/NEJMra0802714.

Advances in the development of cancer therapeutics directed against the RAS-mitogen-activated protein kinase pathway.针对RAS-丝裂原活化蛋白激酶途径的癌症治疗药物研发进展。

Clin Cancer Res. 2008 Jun 15;14(12):3651-6. doi: 10.1158/1078-0432.CCR-08-0333.

Cancer statistics, 2008.2008年癌症统计数据。

CA Cancer J Clin. 2008 Mar-Apr;58(2):71-96. doi: 10.3322/CA.2007.0010. Epub 2008 Feb 20.

A review of feature selection techniques in bioinformatics.生物信息学中特征选择技术综述。

Bioinformatics. 2007 Oct 1;23(19):2507-17. doi: 10.1093/bioinformatics/btm344. Epub 2007 Aug 24.

ErbB receptors: from oncogenes to targeted cancer therapies.表皮生长因子受体：从癌基因到癌症靶向治疗

J Clin Invest. 2007 Aug;117(8):2051-8. doi: 10.1172/JCI32278.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种基于过滤的特征选择方法，用于识别肺癌的潜在生物标志物。

A filter-based feature selection approach for identifying potential biomarkers for lung cancer.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献