Chang Kai-Po, Chu Yen-Wei, Wang John
Department of Pathology, China Medical University Hospital, Taichung 404, Taiwan.
Ph.D. Program in Medical Biotechnology, National Chung Hsing University, Taichung 402, Taiwan.
Open Med (Wars). 2019 Feb 20;14:91-98. doi: 10.1515/med-2019-0013. eCollection 2019.
Hormone receptors of breast cancer, such as estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (Her-2), are important prognostic factors for breast cancer.
The current study aimed to develop a method to retrieve the statistics of hormone receptor expression status, documented in pathology reports, given their importance in research for primary and recurrent breast cancer, and quality management of pathology laboratories.
A two-stage text mining approach via regular expression-based word/phrase matching, was developed to retrieve the data.
The method achieved a sensitivity of 98.8%, 98.7% and 98.4% for extraction of ER, PR, and Her-2 results. The hormone expression status from 3679 primary and 44 recurrent breast cancer cases was successfully retrieved with the method. Statistical analysis of these data showed that the recurrent disease had a significantly lower positivity rate for ER (54.5% vs 76.5%, p=0.001278) than primary breast cancer and a higher positivity rate for Her-2 (48.8% vs 16.2%, p=9.79e-8). These results corroborated the previous literature.
Text mining on pathology reports using the developed method may benefit research of primary and recurrent breast cancer.
乳腺癌的激素受体,如雌激素受体(ER)、孕激素受体(PR)和人表皮生长因子受体2(Her-2),是乳腺癌重要的预后因素。
鉴于激素受体表达状态在原发性和复发性乳腺癌研究以及病理实验室质量管理中的重要性,本研究旨在开发一种方法,以获取病理报告中记录的激素受体表达状态统计数据。
通过基于正则表达式的词/短语匹配开发了一种两阶段文本挖掘方法来检索数据。
该方法提取ER、PR和Her-2结果的灵敏度分别为98.8%、98.7%和98.4%。使用该方法成功检索到3679例原发性和44例复发性乳腺癌病例的激素表达状态。对这些数据的统计分析表明,复发性疾病的ER阳性率(54.5%对76.5%,p = 0.001278)显著低于原发性乳腺癌,而Her-2阳性率(48.8%对16.2%,p = 9.79e-8)高于原发性乳腺癌。这些结果证实了先前的文献报道。
使用所开发的方法对病理报告进行文本挖掘可能有助于原发性和复发性乳腺癌的研究。