一种强大的用于配对样本判别分析和特征选择的工具，影响了对将肺组织重编程为腺癌所必需的基因的识别。

A robust tool for discriminative analysis and feature selection in paired samples impacts the identification of the genes essential for reprogramming lung tissue to adenocarcinoma.

机构信息

Bioinformatics Institute, A-STAR, Singapore.

出版信息

BMC Genomics. 2011 Nov 30;12 Suppl 3(Suppl 3):S24. doi: 10.1186/1471-2164-12-S3-S24.

DOI:10.1186/1471-2164-12-S3-S24

PMID:22369099

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3377915/

Abstract

BACKGROUND

Lung cancer is the leading cause of cancer deaths in the world. The most common type of lung cancer is lung adenocarcinoma (AC). The genetic mechanisms of the early stages and lung AC progression steps are poorly understood. There is currently no clinically applicable gene test for the early diagnosis and AC aggressiveness. Among the major reasons for the lack of reliable diagnostic biomarkers are the extraordinary heterogeneity of the cancer cells, complex and poorly understudied interactions of the AC cells with adjacent tissue and immune system, gene variation across patient cohorts, measurement variability, small sample sizes and sub-optimal analytical methods. We suggest that gene expression profiling of the primary tumours and adjacent tissues (PT-AT) handled with a rational statistical and bioinformatics strategy of biomarker prediction and validation could provide significant progress in the identification of clinical biomarkers of AC. To minimise sample-to-sample variability, repeated multivariate measurements in the same object (organ or tissue, e.g. PT-AT in lung) across patients should be designed, but prediction and validation on the genome scale with small sample size is a great methodical challenge.

RESULTS

To analyse PT-AT relationships efficiently in the statistical modelling, we propose an Extreme Class Discrimination (ECD) feature selection method that identifies a sub-set of the most discriminative variables (e.g. expressed genes). Our method consists of a paired Cross-normalization (CN) step followed by a modified sign Wilcoxon test with multivariate adjustment carried out for each variable. Using an Affymetrix U133A microarray paired dataset of 27 AC patients, we reviewed the global reprogramming of the transcriptome in human lung AC tissue versus normal lung tissue, which is associated with about 2,300 genes discriminating the tissues with 100% accuracy. Cluster analysis applied to these genes resulted in four distinct gene groups which we classified as associated with (i) up-regulated genes in the mitotic cell cycle lung AC, (ii) silenced/suppressed gene specific for normal lung tissue, (iii) cell communication and cell motility and (iv) the immune system features. The genes related to mutagenesis, specific lung cancers, early stage of AC development, tumour aggressiveness and metabolic pathway alterations and adaptations of cancer cells are strongly enriched in the AC PT-AT discriminative gene set. Two AC diagnostic biomarkers SPP1 and CENPA were successfully validated on RT-RCR tissue array. ECD method was systematically compared to several alternative methods and proved to be of better performance and as well as it was validated by comparison of the predicted gene set with literature meta-signature.

CONCLUSIONS

We developed a method that identifies and selects highly discriminative variables from high dimensional data spaces of potential biomarkers based on a statistical analysis of paired samples when the number of samples is small. This method provides superior selection in comparison to conventional methods and can be widely used in different applications. Our method revealed at least 23 hundreds patho-biologically essential genes associated with the global transcriptional reprogramming of human lung epithelium cells and lung AC aggressiveness. This gene set includes many previously published AC biomarkers reflecting inherent disease complexity and specifies the mechanisms of carcinogenesis in the lung AC. SPP1, CENPA and many other PT-AT discriminative genes could be considered as the prospective diagnostic and prognostic biomarkers of lung AC.

摘要

背景

肺癌是世界上癌症死亡的主要原因。最常见的肺癌类型是肺腺癌（AC）。肺癌早期阶段和 AC 进展步骤的遗传机制还了解甚少。目前，没有用于早期诊断和 AC 侵袭性的临床适用的基因检测。缺乏可靠诊断生物标志物的主要原因之一是癌细胞的非凡异质性、AC 细胞与相邻组织和免疫系统之间复杂且研究不足的相互作用、患者队列之间的基因变异、测量变异性、样本量小以及分析方法不理想。我们建议，通过对原发性肿瘤和相邻组织（PT-AT）进行基因表达谱分析，并采用合理的统计和生物信息学策略进行生物标志物预测和验证，可能会在识别 AC 的临床生物标志物方面取得重大进展。为了最大程度地减少样本间的变异性，应针对患者的同一对象（器官或组织，例如肺中的 PT-AT）设计重复的多变量测量，但在基因组范围内进行小样本量的预测和验证是一项重大的方法学挑战。

结果

为了在统计建模中有效地分析 PT-AT 关系，我们提出了一种极端类别判别（ECD）特征选择方法，该方法可以识别出最具判别力的变量子集（例如表达基因）。我们的方法包括配对的交叉归一化（CN）步骤，然后对每个变量进行多元调整的改进符号 Wilcoxon 检验。使用 27 例 AC 患者的 Affymetrix U133A 微阵列配对数据集，我们回顾了人类肺 AC 组织与正常肺组织中转录组的全局重编程，该重编程与约 2300 个基因有关，这些基因可以 100%准确地区分组织。对这些基因进行聚类分析得到了四个不同的基因群，我们将其分类为与（i）肺 AC 有丝分裂细胞周期中上调的基因、（ii）正常肺组织中沉默/抑制的基因、（iii）细胞通讯和细胞运动以及（iv）免疫系统特征相关的基因。与致突变、特定肺癌、AC 早期发展、肿瘤侵袭性和代谢途径改变以及癌细胞适应相关的基因在 AC-PT-AT 判别基因集中强烈富集。两种 AC 诊断生物标志物 SPP1 和 CENPA 已成功通过 RT-RCR 组织阵列进行验证。ECD 方法与几种替代方法进行了系统比较，结果证明其性能更好，并且通过将预测基因集与文献荟萃签名进行比较进行了验证。

结论

我们开发了一种方法，该方法基于对小样本量的配对样本的统计分析，从潜在生物标志物的高维数据空间中识别和选择高度判别性变量。与传统方法相比，该方法具有更好的选择性能，并且可以广泛应用于不同的应用。我们的方法揭示了至少 2300 个与人类肺上皮细胞和肺 AC 侵袭性的全局转录重编程相关的病理生物学必需基因。该基因集包括许多先前发表的 AC 生物标志物，反映了疾病的固有复杂性，并指定了肺 AC 中的致癌机制。SPP1、CENPA 和许多其他 PT-AT 判别基因可被视为肺 AC 的潜在诊断和预后生物标志物。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8654/3377915/9142dbc1e50e/1471-2164-12-S3-S24-1.jpg

相似文献

A robust tool for discriminative analysis and feature selection in paired samples impacts the identification of the genes essential for reprogramming lung tissue to adenocarcinoma.

BMC Genomics. 2011 Nov 30;12 Suppl 3(Suppl 3):S24. doi: 10.1186/1471-2164-12-S3-S24.

A novel strategy of integrated microarray analysis identifies CENPA, CDK1 and CDC20 as a cluster of diagnostic biomarkers in lung adenocarcinoma.

Cancer Lett. 2018 Jul 1;425:43-53. doi: 10.1016/j.canlet.2018.03.043. Epub 2018 Mar 31.

Development and Validation of an Individualized Immune Prognostic Signature in Early-Stage Nonsquamous Non-Small Cell Lung Cancer.

JAMA Oncol. 2017 Nov 1;3(11):1529-1537. doi: 10.1001/jamaoncol.2017.1609.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes.

BMC Bioinformatics. 2004 Jun 24;5:81. doi: 10.1186/1471-2105-5-81.

Gene expression analysis in clear cell renal cell carcinoma using gene set enrichment analysis for biostatistical management.

BJU Int. 2011 Jul;108(2 Pt 2):E29-35. doi: 10.1111/j.1464-410X.2010.09794.x. Epub 2011 Mar 16.

Robust prognostic gene expression signatures in bladder cancer and lung adenocarcinoma depend on cell cycle related genes.

PLoS One. 2014 Jan 22;9(1):e85249. doi: 10.1371/journal.pone.0085249. eCollection 2014.

Pathway-based identification of a smoking associated 6-gene signature predictive of lung cancer risk and survival.

Artif Intell Med. 2012 Jun;55(2):97-105. doi: 10.1016/j.artmed.2012.01.001. Epub 2012 Feb 11.

Test on existence of histology subtype-specific prognostic signatures among early stage lung adenocarcinoma and squamous cell carcinoma patients using a Cox-model based filter.

Biol Direct. 2015 Apr 7;10:15. doi: 10.1186/s13062-015-0051-z.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification

引用本文的文献

GRANT Motif Regulates CENP-A Incorporation and Restricts RNA Polymerase II Accessibility at Centromere.

Genes (Basel). 2022 Sep 22;13(10):1697. doi: 10.3390/genes13101697.

Hub genes and key pathways of non-small lung cancer identified using bioinformatics.

Oncol Lett. 2018 Aug;16(2):2344-2354. doi: 10.3892/ol.2018.8882. Epub 2018 Jun 4.

Multiple signatures of a disease in potential biomarker space: Getting the signatures consensus and identification of novel biomarkers.

BMC Genomics. 2015;16 Suppl 7(Suppl 7):S2. doi: 10.1186/1471-2164-16-S7-S2. Epub 2015 Jun 11.

Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

PLoS One. 2015 Apr 1;10(4):e0119448. doi: 10.1371/journal.pone.0119448. eCollection 2015.

How bioinformatics influences health informatics: usage of biomolecular sequences, expression profiles and automated microscopic image analyses for clinical needs and public health.

Health Inf Sci Syst. 2013 Jan 10;1:2. doi: 10.1186/2047-2501-1-2. eCollection 2013.

Lung cancer transcriptomes refined with laser capture microdissection.

Am J Pathol. 2014 Nov;184(11):2868-84. doi: 10.1016/j.ajpath.2014.06.028. Epub 2014 Aug 14.

Secretomes are a potential source of molecular targets for cancer therapies and indicate that APOE is a candidate biomarker for lung adenocarcinoma metastasis.

Mol Biol Rep. 2014 Nov;41(11):7507-23. doi: 10.1007/s11033-014-3641-4. Epub 2014 Aug 7.

Analysis of global changes in gene expression induced by human polynucleotide phosphorylase (hPNPase(old-35)).

J Cell Physiol. 2014 Dec;229(12):1952-62. doi: 10.1002/jcp.24645.

InCoB celebrates its tenth anniversary as first joint conference with ISCB-Asia.

BMC Genomics. 2011 Nov 30;12 Suppl 3(Suppl 3):S1. doi: 10.1186/1471-2164-12-S3-S1.

本文引用的文献

A novel SNP analysis method to detect copy number alterations with an unbiased reference signal directly from tumor samples.

BMC Med Genomics. 2011 Jan 26;4:14. doi: 10.1186/1755-8794-4-14.

Gene expression-based classification of non-small cell lung carcinomas and survival prediction.

PLoS One. 2010 Apr 22;5(4):e10312. doi: 10.1371/journal.pone.0010312.

SNP array analysis in hematologic malignancies: avoiding false discoveries.

Blood. 2010 May 27;115(21):4157-61. doi: 10.1182/blood-2009-11-203182. Epub 2010 Mar 19.

Targeting metabolic transformation for cancer therapy.

Nat Rev Cancer. 2010 Apr;10(4):267-77. doi: 10.1038/nrc2817. Epub 2010 Mar 19.

Gene expression-based prognostic signatures in lung cancer: ready for clinical use?

J Natl Cancer Inst. 2010 Apr 7;102(7):464-74. doi: 10.1093/jnci/djq025. Epub 2010 Mar 16.

CENPA a genomic marker for centromere activity and human diseases.

Curr Genomics. 2009 Aug;10(5):326-35. doi: 10.2174/138920209788920985.

Frequent genetic differences between matched primary and metastatic breast cancer provide an approach to identification of biomarkers for disease progression.

Eur J Hum Genet. 2010 May;18(5):560-8. doi: 10.1038/ejhg.2009.230. Epub 2010 Jan 6.

Human cancers converge at the HIF-2alpha oncogenic axis.

Proc Natl Acad Sci U S A. 2009 Dec 15;106(50):21306-11. doi: 10.1073/pnas.0906432106. Epub 2009 Dec 2.

Feature selection for predicting tumor metastases in microarray experiments using paired design.

Cancer Inform. 2007 Mar 20;3:213-8.

Epidermal growth factor receptor mutations in plasma DNA samples predict tumor response in Chinese patients with stages IIIB to IV non-small-cell lung cancer.

J Clin Oncol. 2009 Jun 1;27(16):2653-9. doi: 10.1200/JCO.2008.17.3930. Epub 2009 May 4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种强大的用于配对样本判别分析和特征选择的工具，影响了对将肺组织重编程为腺癌所必需的基因的识别。

A robust tool for discriminative analysis and feature selection in paired samples impacts the identification of the genes essential for reprogramming lung tissue to adenocarcinoma.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献