Suppr超能文献

基于转录组的基因集特征化在机器学习模型预测转移性癌症起源中的应用。

Application of Transcriptome-Based Gene Set Featurization for Machine Learning Model to Predict the Origin of Metastatic Cancer.

作者信息

Jeong Yeonuk, Chu Jinah, Kang Juwon, Baek Seungjun, Lee Jae-Hak, Jung Dong-Sub, Kim Won-Woo, Kim Yi-Rang, Kang Jihoon, Do In-Gu

机构信息

Oncocross Ltd., Seoul 04168, Republic of Korea.

Department of Pathology, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul 03181, Republic of Korea.

出版信息

Curr Issues Mol Biol. 2024 Jul 9;46(7):7291-7302. doi: 10.3390/cimb46070432.

Abstract

Identifying the primary site of origin of metastatic cancer is vital for guiding treatment decisions, especially for patients with cancer of unknown primary (CUP). Despite advanced diagnostic techniques, CUP remains difficult to pinpoint and is responsible for a considerable number of cancer-related fatalities. Understanding its origin is crucial for effective management and potentially improving patient outcomes. This study introduces a machine learning framework, ONCOfind-AI, that leverages transcriptome-based gene set features to enhance the accuracy of predicting the origin of metastatic cancers. We demonstrate its potential to facilitate the integration of RNA sequencing and microarray data by using gene set scores for characterization of transcriptome profiles generated from different platforms. Integrating data from different platforms resulted in improved accuracy of machine learning models for predicting cancer origins. We validated our method using external data from clinical samples collected through the Kangbuk Samsung Medical Center and Gene Expression Omnibus. The external validation results demonstrate a top-1 accuracy ranging from 0.80 to 0.86, with a top-2 accuracy of 0.90. This study highlights that incorporating biological knowledge through curated gene sets can help to merge gene expression data from different platforms, thereby enhancing the compatibility needed to develop more effective machine learning prediction models.

摘要

确定转移性癌症的原发部位对于指导治疗决策至关重要,尤其是对于原发灶不明的癌症(CUP)患者。尽管诊断技术不断进步,但CUP仍然难以精确确定,并且导致了相当数量的癌症相关死亡。了解其起源对于有效管理和潜在改善患者预后至关重要。本研究引入了一个机器学习框架ONCOfind-AI,该框架利用基于转录组的基因集特征来提高预测转移性癌症起源的准确性。我们通过使用基因集分数来表征不同平台生成的转录组谱,证明了其促进RNA测序和微阵列数据整合的潜力。整合来自不同平台的数据提高了预测癌症起源的机器学习模型的准确性。我们使用通过江北三星医疗中心和基因表达综合数据库收集的临床样本的外部数据验证了我们的方法。外部验证结果表明,top-1准确率在0.80至0.86之间,top-2准确率为0.90。这项研究强调,通过精心策划的基因集纳入生物学知识有助于合并来自不同平台的基因表达数据,从而增强开发更有效机器学习预测模型所需的兼容性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17ee/11276602/d435e1b981fb/cimb-46-00432-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验