Zhonghua Bing Li Xue Za Zhi. 2021 Aug 8;50(8):882-890. doi: 10.3760/cma.j.cn112151-20210427-00328.
To reveal the current status and problem of surgical pathological diagnosis and to construct a structured pathological database of lung cancer in China, and to further improve the level of pathological standards and scientific data. Case report form (CRF) was made according to the diagnostic criteria of radical resection specimens of lung cancer, including general information, smoking history, pathological report (including molecular data), treatment and prognosis, etc. The original clinicopathological data of patients with primary lung cancer who underwent surgical resection in 23 centers from January 2013 to December 2017 were retrospectively collected. After desensitization, filtering and natural language processing, combined with domain knowledge base, and the raw data in the form of continuous text were structured. A total of 153 817 non-structured pathological reports, 57 748 molecular reports and 13 295 pieces of treatment and/or follow-up information were collected. Finally, 75 941 effective structured documents (including 86 979 primary lesions) were obtained. The quality of treatment and follow-up data was not satisfactory; Number of CRF index involved showed an increasing trend with time coursing, and had no significant difference between general hospitals and cancer hospitals (<0.05). The indexes with low use rate until 2017 were peripheral lung disease, pTNM stage, spread though air space, and pathological evaluation of neoadjuvant treatment response. The ratio of male to female was 1.2∶1.0; 8 648 cases (11.39%) had smoking history, and the ratio of smokers to non-smokers was 0.92∶1.00. Age group of the highest incidence was 60-69 years, accounting for 38.76%. The top five common pathological subtypes were adenocarcinoma (74.58%), squamous cell carcinoma (18.01%), small cell carcinoma (2.18%), adenosquamous carcinoma (1.71%) and sarcomatoid carcinoma (0.82%); histological subtypes were significantly correlated with gender, age and smoking status (<0.05): adenocarcinoma (58.5%) and squamous cell carcinoma (31.6%) were the main pathological types in male patients, while adenocarcinoma (91.6%) and squamous cell carcinoma (3.4%) were the main pathological types in female patients; adenocarcinoma (85.6%) was the main type of non-smoking patients, adenocarcinoma and squamous cell carcinoma accounted for 50.6% and 37.7% respectively in smoking patients; the proportion of adenocarcinoma decreased with age, while squamous cell carcinoma and small cell carcinoma increased. The top five common immunohistochemical (IHC) markers were TTF1, CK7, ALK-Ventana, Napsin A and p63 and the most common panel included 7-9 IHC markers. The overall EGFR mutation rate was 51.32% (all 10 335/20 139 by PCR), the total ALK positive rate was 6.18% (2 084/33 726, PCR, FISH and IHC-Ventana platform positive rates were 3.01%, 8.93% and 6.58%, respectively), the KRAS mutation rate was 7.01% (all 662/9 441 by PCR). The positive rates of EGFR, ALK and KRAS were 58.14% (9 986/17 175), 6.59% (1 791/27 176) and 7.52% (607/8 068) in adenocarcinoma, 5.83% (113/1 939), 0.40% (1/251) and 1.76% (15/852) in squamous cell carcinoma, respectively. Due to the poor quality of prognostic data, it was difficult to obtain effective survival analysis. The standardization of pathological reports (including molecular detection) of lung cancer in China is generally fine, but most of the models are still in the state of unstructured continuous text. The postoperative pathological staging, pathological evaluation of neoadjuvant therapy response and high-quality prognosis data need paying more attention and improvement. Panel of IHC markers is balanced although further precision. The use of lung cancer structured report template and intelligent structured database management mode to improve the degree of the pathologic diagnosis standardization and data quality is recommended.
为揭示外科病理诊断的现状与问题,构建中国肺癌结构化病理数据库,进一步提高病理标准水平和科学数据质量。根据肺癌根治性切除标本的诊断标准制定病例报告表(CRF),内容包括一般信息、吸烟史、病理报告(包括分子数据)、治疗及预后等。回顾性收集2013年1月至2017年12月期间在23个中心接受手术切除的原发性肺癌患者的原始临床病理资料。经过脱敏、筛选及自然语言处理,并结合领域知识库,将连续文本形式的原始数据进行结构化处理。共收集到153817份非结构化病理报告、57748份分子报告以及13295条治疗和/或随访信息。最终获得75941份有效结构化文档(包括86979个原发性病灶)。治疗及随访数据质量不尽人意;CRF索引涉及数量随时间呈上升趋势,综合医院与肿瘤医院之间无显著差异(<0.05)。截至2017年使用率较低的索引包括外周肺疾病、pTNM分期、气腔播散以及新辅助治疗反应的病理评估。男女比例为1.2∶1.0;8648例(11.39%)有吸烟史,吸烟者与非吸烟者比例为0.92∶1.00。发病率最高的年龄组为60 - 69岁,占38.76%。前五位常见病理亚型为腺癌(74.58%)、鳞状细胞癌(18.01%)、小细胞癌(2.18%)、腺鳞癌(1.71%)和肉瘤样癌(0.82%);组织学亚型与性别、年龄和吸烟状态显著相关(<0.05):腺癌(58.5%)和鳞状细胞癌(31.6%)是男性患者的主要病理类型,而腺癌(91.6%)和鳞状细胞癌(3.4%)是女性患者的主要病理类型;腺癌(85.6%)是非吸烟患者的主要类型,吸烟患者中腺癌和鳞状细胞癌分别占50.6%和37.7%;腺癌比例随年龄增长而降低,而鳞状细胞癌和小细胞癌比例增加。前五位常见免疫组化(IHC)标志物为TTF1、CK7、ALK-Ventana、Napsin A和p63,最常见的组合包括7 - 9种IHC标志物。总体EGFR突变率为51.32%(PCR检测共10335/20139),ALK总阳性率为6.18%(2084/33726,PCR、FISH及IHC-Ventana平台阳性率分别为3.01%、8.93%和6.58%),KRAS突变率为7.01%(PCR检测共662/9441)。腺癌中EGFR、ALK和KRAS阳性率分别为58.14%(9986/17175)、6.59%(1791/27176)和7.52%(607/8068),鳞状细胞癌中分别为5.83%(113/1939)、0.40%(1/251)和1.76%(15/852)。由于预后数据质量较差,难以进行有效的生存分析。中国肺癌病理报告(包括分子检测)的规范化总体良好,但大多数模式仍处于非结构化连续文本状态。术后病理分期、新辅助治疗反应的病理评估以及高质量的预后数据需更多关注和改进。免疫组化标志物组合虽有待进一步精准,但较为均衡。建议采用肺癌结构化报告模板及智能结构化数据库管理模式,以提高病理诊断规范化程度和数据质量。