Ancusa Versavia Maria, Trusculescu Ana Adriana, Constantinescu Amalia, Burducescu Alexandra, Fira-Mladinescu Ovidiu, Manolescu Diana Lumita, Traila Daniel, Wellmann Norbert, Oancea Cristian Iulian
Department of Computer and Information Technology, Automation and Computers Faculty, "Politehnica" University of Timisoara, Vasile Pârvan Blvd, no. 2, 300223 Timisoara, Romania.
Center for Research and Innovation in Personalized Medicine of Respiratory Diseases (CRIPMRD), 'Victor Babes' University of Medicine and Pharmacy, Eftimie Murgu Square no. 2, 300041 Timisoara, Romania.
Cancers (Basel). 2025 Jul 10;17(14):2305. doi: 10.3390/cancers17142305.
Lung cancer remains a major cause of cancer-related mortality, with regional differences in incidence and patient characteristics. This study aimed to verify and quantify a perceived dramatic increase in lung cancer cases at a Romanian center, identify distinct patient phenotypes using unsupervised machine learning, and characterize contributing factors, including demographic shifts, changes in the healthcare system, and geographic patterns. A comprehensive retrospective analysis of 4206 lung cancer patients admitted between 2013 and 2024 was conducted, with detailed molecular characterization of 398 patients from 2023 to 2024. Temporal trends were analyzed using statistical methods, while k-means clustering on 761 clinical features identified patient phenotypes. The geographic distribution, smoking patterns, respiratory comorbidities, and demographic factors were systematically characterized across the identified clusters. We confirmed an 80.5% increase in lung cancer admissions between pre-pandemic (2013-2020) and post-pandemic (2022-2024) periods, exceeding the 51.1% increase in total hospital admissions and aligning with national Romanian trends. Five distinct patient clusters emerged: elderly never-smokers (28.9%) with the highest metastatic rates (44.3%), heavy-smoking males (27.4%), active smokers with comprehensive molecular testing (31.7%), young mixed-gender cohort (7.3%) with balanced demographics, and extreme heavy smokers (4.8%) concentrated in rural areas (52.6%) with severe comorbidity burden. Clusters demonstrated significant differences in age ( < 0.001), smoking intensity ( < 0.001), geographic distribution ( < 0.001), as well as molecular characteristics. COPD prevalence was exceptionally high (44.8-78.9%) across clusters, while COVID-19 history remained low (3.4-8.3%), suggesting a limited direct association between the pandemic and cancer. This study presents the first comprehensive machine learning-based stratification of lung cancer patients in Romania, confirming genuine epidemiological increases beyond healthcare system artifacts. The identification of five clinically meaningful phenotypes-particularly rural extreme smokers and age-stratified never-smokers-demonstrates the value of unsupervised clustering for regional healthcare planning. These findings establish frameworks for targeted screening programs, personalized treatment approaches, and resource allocation strategies tailored to specific high-risk populations while highlighting the potential of artificial intelligence in identifying actionable clinical patterns for the implementation of precision medicine.
肺癌仍然是癌症相关死亡的主要原因,在发病率和患者特征方面存在地区差异。本研究旨在核实和量化罗马尼亚一家中心肺癌病例明显增加的情况,使用无监督机器学习识别不同的患者表型,并确定相关因素,包括人口结构变化、医疗系统变化和地理模式。对2013年至2024年期间收治的4206例肺癌患者进行了全面的回顾性分析,并对2023年至2024年期间的398例患者进行了详细的分子特征分析。使用统计方法分析时间趋势,同时对761个临床特征进行k均值聚类以识别患者表型。对已识别的聚类中的地理分布、吸烟模式、呼吸道合并症和人口因素进行了系统表征。我们证实,疫情前(2013 - 2020年)和疫情后(2022 - 2024年)期间肺癌入院人数增加了80.5%,超过了总住院人数51.1%的增幅,且与罗马尼亚全国趋势一致。出现了五个不同的患者聚类:转移率最高(44.