Zhang Rukui, Liu Zhaorui, Zhu Chaoyu, Cai Hui, Yin Kai, Zhong Fan, Liu Lei
Institute of Biomedical Sciences, Fudan University, 131 Dongan Road, Shanghai 200032, China.
Department of Gastrointestinal Surgery, Changhai Hospital, Naval Military Medical University, 168 Changhai Road, Shanghai 200433, China.
Bioengineering (Basel). 2024 Aug 9;11(8):808. doi: 10.3390/bioengineering11080808.
Clinical molecular genetic testing and molecular imaging dramatically increase the quantity of clinical data. Combined with the extensive application of electronic health records, a medical data ecosystem is forming, which calls for big-data-based medicine models. We tried to use big data analytics to search for similar patients in a cancer cohort, showing how to apply artificial intelligence (AI) algorithms to clinical data processing to obtain clinically significant results, with the ultimate goal of improving healthcare management.
In order to overcome the weaknesses of most data processing algorithms that rely on expert labeling and annotation, we uniformly adopted one-hot encoding for all types of clinical data, calculating the Euclidean distance to measure patient similarity and subgrouping via an unsupervised learning model. Overall survival (OS) was investigated to assess the clinical validity and clinical relevance of the model.
We took gastric cancers (GCs) as an example to build a high-dimensional clinical patient similarity network (cPSN). When performing the survival analysis, we found that Cluster_2 had the longest survival rates, while Cluster_5 had the worst prognosis among all the subgroups. As patients in the same subgroup share some clinical characteristics, the clinical feature analysis found that Cluster_2 harbored more lower distal GCs than upper proximal GCs, shedding light on the debates.
Overall, we constructed a cancer-specific cPSN with excellent interpretability and clinical significance, which would recapitulate patient similarity in the real-world. The constructed cPSN model is scalable, generalizable, and performs well for various data types.
临床分子基因检测和分子影像极大地增加了临床数据量。结合电子健康记录的广泛应用,一个医学数据生态系统正在形成,这就需要基于大数据的医学模式。我们尝试使用大数据分析在癌症队列中寻找相似患者,展示如何将人工智能(AI)算法应用于临床数据处理以获得具有临床意义的结果,最终目标是改善医疗管理。
为了克服大多数依赖专家标注的数据处理算法的弱点,我们对所有类型的临床数据统一采用独热编码,计算欧几里得距离以衡量患者相似度,并通过无监督学习模型进行亚组划分。通过总生存期(OS)来研究评估该模型的临床有效性和临床相关性。
我们以胃癌(GC)为例构建了一个高维临床患者相似性网络(cPSN)。在进行生存分析时,我们发现Cluster_2在所有亚组中的生存率最长,而Cluster_5的预后最差。由于同一亚组中的患者具有一些临床特征,临床特征分析发现Cluster_2中远端低位GC比近端高位GC更多,这为相关争论提供了线索。
总体而言,我们构建了一个具有出色可解释性和临床意义的癌症特异性cPSN,它能在现实世界中概括患者的相似性。所构建的cPSN模型具有可扩展性、通用性,并且对各种数据类型都表现良好。