Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain.
Med Biol Eng Comput. 2012 Sep;50(9):981-90. doi: 10.1007/s11517-012-0914-8. Epub 2012 May 24.
Current breast cancer research involves the study of many different prognosis factors: primary tumor size, lymph node status, tumor grade, tumor receptor status, p53, and ki67 levels, among others. High-throughput microarray technologies are allowing to better understand and identify prognostic factors in breast cancer. But the massive amounts of data derived from these technologies require the use of efficient computational techniques to unveil new and relevant biomedical knowledge. Furthermore, integrative tools are needed that effectively combine heterogeneous types of biomedical data, such as prognosis factors and expression data. The objective of this study was to integrate information from the main prognostic factors in breast cancer with whole-genome microarray data to identify potential associations among them. We propose the application of a data mining approach, called fuzzy association rule mining, to automatically unveil these associations. This paper describes the proposed methodology and illustrates how it can be applied to different breast cancer datasets. The obtained results support known associations involving the number of copies of chromosome-17, HER2 amplification, or the expression level of estrogen and progesterone receptors in breast cancer patients. They also confirm the correspondence between the HER2 status predicted by different testing methodologies (immunohistochemistry and fluorescence in situ hybridization). In addition, other interesting rules involving CDC6, SOX11, and EFEMP1 genes are identified, although further detailed studies are needed to statistically confirm these findings. As part of this study, a web platform implementing the fuzzy association rule mining approach has been made freely available at: http://www.genome2.ugr.es/biofar .
原发肿瘤大小、淋巴结状态、肿瘤分级、肿瘤受体状态、p53 和 ki67 水平等。高通量微阵列技术可以更好地了解和识别乳腺癌的预后因素。但是,这些技术产生的大量数据需要使用有效的计算技术来揭示新的相关生物医学知识。此外,需要整合工具来有效地结合预后因素和表达数据等不同类型的生物医学数据。本研究的目的是整合乳腺癌主要预后因素的信息与全基因组微阵列数据,以识别它们之间的潜在关联。我们提出了应用一种数据挖掘方法,称为模糊关联规则挖掘,以自动揭示这些关联。本文描述了所提出的方法,并说明了如何将其应用于不同的乳腺癌数据集。所得结果支持涉及乳腺癌患者染色体 17 号拷贝数、HER2 扩增或雌激素和孕激素受体表达水平的已知关联。它们还证实了不同检测方法(免疫组织化学和荧光原位杂交)预测的 HER2 状态之间的对应关系。此外,还确定了涉及 CDC6、SOX11 和 EFEMP1 基因的其他有趣规则,但需要进一步的详细研究来统计证实这些发现。作为本研究的一部分,实现模糊关联规则挖掘方法的网络平台已在 http://www.genome2.ugr.es/biofar 上免费提供。