Lee In-Hee, Lushington Gerald H, Visvanathan Mahesh
Bioinformatics Core Facility, University of Kansas, Lawrence, KS 66046, USA.
J Clin Bioinforma. 2011 Mar 21;1(1):11. doi: 10.1186/2043-9113-1-11.
Lung cancer is the leading cause of death from cancer in the world and its treatment is dependant on the type and stage of cancer detected in the patient. Molecular biomarkers that can characterize the cancer phenotype are thus a key tool in planning a therapeutic response. A common protocol for identifying such biomarkers is to employ genomic microarray analysis to find genes that show differential expression according to disease state or type. Data-mining techniques such as feature selection are often used to isolate, from among a large manifold of genes with differential expression, those specific genes whose differential expression patterns are of optimal value in phenotypic differentiation. One such technique, Biomarker Identifier (BMI), has been developed to identify features with the ability to distinguish between two data groups of interest, which is thus highly applicable for such studies.
Microarray data with validated genes was used to evaluate the utility of BMI in identifying markers for lung cancer. This data set contains a set of 129 gene expression profiles from large-airway epithelial cells (60 samples from smokers with lung cancer and 69 from smokers without lung cancer) and 7 genes from this data have been confirmed to be differentially expressed by quantitative PCR. Using this data set, BMI was compared with various well-known feature selection methods and was found to be more successful than other methods in finding useful genes to classify cancerous samples. Also it is evident that genes selected by BMI (given the same number of genes and classification algorithms) showed better discriminative power than those from the original study. After pathway analysis on the selected genes by BMI, we have been able to correlate the selected genes with well-known cancer-related pathways.
Our results show that BMI can be used to analyze microarray data and to find useful genes for classifying samples. Pathway analysis suggests that BMI is successful in identifying biomarker-quality cancer-related genes from the data.
肺癌是全球癌症死亡的主要原因,其治疗取决于在患者中检测到的癌症类型和阶段。因此,能够表征癌症表型的分子生物标志物是规划治疗反应的关键工具。识别此类生物标志物的常用方案是采用基因组微阵列分析来寻找根据疾病状态或类型显示差异表达的基因。诸如特征选择等数据挖掘技术通常用于从大量差异表达的基因中分离出那些差异表达模式在表型分化中具有最佳价值的特定基因。一种这样的技术,即生物标志物标识符(BMI),已被开发用于识别具有区分两个感兴趣数据组能力的特征,因此非常适用于此类研究。
使用具有经过验证基因的微阵列数据来评估BMI在识别肺癌标志物方面的效用。该数据集包含一组来自大气道上皮细胞的129个基因表达谱(60个来自肺癌吸烟者的样本和69个来自无肺癌吸烟者的样本),并且该数据中的7个基因已通过定量PCR证实存在差异表达。使用该数据集,将BMI与各种知名的特征选择方法进行比较,发现BMI在找到用于对癌性样本进行分类的有用基因方面比其他方法更成功。同样明显的是,由BMI选择的基因(在相同数量的基因和分类算法的情况下)比原始研究中的基因表现出更好的判别能力。对BMI选择的基因进行通路分析后,我们能够将选择的基因与知名的癌症相关通路相关联。
我们的结果表明,BMI可用于分析微阵列数据并找到用于对样本进行分类的有用基因。通路分析表明,BMI成功地从数据中识别出具有生物标志物质量的癌症相关基因。