Suppr超能文献

利用多组学数据和机器学习预测儿童神经母细胞瘤的综合临床结局

Prediction of Composite Clinical Outcomes for Childhood Neuroblastoma Using Multi-Omics Data and Machine Learning.

作者信息

Wang Panru, Zhang Junying

机构信息

School of Computer Science and Technology, Xidian University, Xi'an 710126, China.

出版信息

Int J Mol Sci. 2024 Dec 27;26(1):136. doi: 10.3390/ijms26010136.

Abstract

Neuroblastoma is a common malignant tumor in childhood that seriously endangers the health and lives of children, making it essential to find effective prognostic markers to accurately predict their clinical outcomes. The development of high-throughput technology in the biomedical field has made it possible to obtain multi-omics data, whose integration can compensate for missing or unreliable information in a single data source. In this study, we integrated clinical data and two omics data, i.e., gene expression and DNA methylation data, to study the prognosis of neuroblastoma. Since the features in omics data are redundant, it is crucial to conduct feature selection on them. We proposed a two-step feature selection (TSFS) method to quickly and accurately select the optimal features, where the first step aims at selecting candidate features and the second step is to remove redundant features among them using our proposed maximal association coefficient (MAC). Our goal is to predict composite clinical outcomes for neuroblastoma patients, i.e., their survival time and vital status at the last follow-up, which was validated to be two inter-correlated tasks. We conducted a series of experiments and evaluated the experimental results using accuracy and AUC (area under the ROC curve) evaluation metrics, which indicated that by the combination of the integration of the three types of data, our proposed TSFS method and a multi-task learning method can synergistically improve the reliability and accuracy of the prediction models.

摘要

神经母细胞瘤是儿童期常见的恶性肿瘤,严重危及儿童的健康和生命,因此找到有效的预后标志物以准确预测其临床结局至关重要。生物医学领域高通量技术的发展使得获取多组学数据成为可能,这些数据的整合可以弥补单一数据源中缺失或不可靠的信息。在本研究中,我们整合了临床数据以及两种组学数据,即基因表达数据和DNA甲基化数据,以研究神经母细胞瘤的预后。由于组学数据中的特征存在冗余,因此对其进行特征选择至关重要。我们提出了一种两步特征选择(TSFS)方法,以快速准确地选择最优特征,其中第一步旨在选择候选特征,第二步是使用我们提出的最大关联系数(MAC)去除其中的冗余特征。我们的目标是预测神经母细胞瘤患者的综合临床结局,即他们的生存时间和最后一次随访时的生命状态,这被验证为两个相互关联的任务。我们进行了一系列实验,并使用准确率和AUC(受试者工作特征曲线下面积)评估指标对实验结果进行了评估,结果表明,通过整合这三种类型的数据、我们提出的TSFS方法和多任务学习方法,可以协同提高预测模型的可靠性和准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1e6/11720239/937f5cf3777d/ijms-26-00136-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验