Thakur Ankita, Mishra Vijay, Jain Sunil K
Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, 400076, MH, India.
Sci Pharm. 2011 Jul-Sep;79(3):493-505. doi: 10.3797/scipharm.1105-11. Epub 2011 Jul 5.
Pathological changes in an organ or tissue may be reflected in proteomic patterns in serum. The early detection of cancer is crucial for successful treatment. Some cancers affect the concentration of certain molecules in the blood, which allows early diagnosis by analyzing the blood mass spectrum. It is possible that exclusive serum proteomic patterns could be used to differentiate cancer samples from non-cancer ones. Several techniques have been developed for the analysis of mass-spectrum curve, and use them for the detection of prostate, ovarian, breast, bladder, pancreatic, kidney, liver, and colon cancers. In present study, we applied data mining to the diagnosis of ovarian cancer and identified the most informative points of the mass-spectrum curve, then used student t-test and neural networks to determine the differences between the curves of cancer patients and healthy people. Two serum SELDI MS data sets were used in this research to identify serum proteomic patterns that distinguish the serum of ovarian cancer cases from non-cancer controls. Statistical testing and genetic algorithm-based methods are used for feature selection respectively. The results showed that (1) data mining techniques can be successfully applied to ovarian cancer detection with a reasonably high performance; (2) the discriminatory features (proteomic patterns) can be very different from one selection method to another.
器官或组织的病理变化可能反映在血清蛋白质组学模式中。癌症的早期检测对于成功治疗至关重要。一些癌症会影响血液中某些分子的浓度,这使得通过分析血液质谱进行早期诊断成为可能。有可能利用独特的血清蛋白质组学模式来区分癌症样本和非癌症样本。已经开发了几种用于分析质谱曲线的技术,并将其用于检测前列腺癌、卵巢癌、乳腺癌、膀胱癌、胰腺癌、肾癌、肝癌和结肠癌。在本研究中,我们将数据挖掘应用于卵巢癌的诊断,识别质谱曲线中最具信息性的点,然后使用学生t检验和神经网络来确定癌症患者和健康人的曲线差异。本研究使用了两个血清表面增强激光解吸电离飞行时间质谱(SELDI MS)数据集来识别区分卵巢癌病例血清和非癌症对照血清的蛋白质组学模式。分别使用统计检验和基于遗传算法的方法进行特征选择。结果表明:(1)数据挖掘技术能够以相当高的性能成功应用于卵巢癌检测;(2)不同选择方法得到的鉴别特征(蛋白质组学模式)可能差异很大。