Mathema Vivek Bhakta, Sen Partho, Lamichhane Santosh, Orešič Matej, Khoomrung Sakda
Metabolomics and Systems Biology, Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand.
Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand.
Comput Struct Biotechnol J. 2023 Jan 31;21:1372-1382. doi: 10.1016/j.csbj.2023.01.043. eCollection 2023.
Cancer progression is linked to gene-environment interactions that alter cellular homeostasis. The use of biomarkers as early indicators of disease manifestation and progression can substantially improve diagnosis and treatment. Large omics datasets generated by high-throughput profiling technologies, such as microarrays, RNA sequencing, whole-genome shotgun sequencing, nuclear magnetic resonance, and mass spectrometry, have enabled data-driven biomarker discoveries. The identification of differentially expressed traits as molecular markers has traditionally relied on statistical techniques that are often limited to linear parametric modeling. The heterogeneity, epigenetic changes, and high degree of polymorphism observed in oncogenes demand biomarker-assisted personalized medication schemes. Deep learning (DL), a major subunit of machine learning (ML), has been increasingly utilized in recent years to investigate various diseases. The combination of ML/DL approaches for performance optimization across multi-omics datasets produces robust ensemble-learning prediction models, which are becoming useful in precision medicine. This review focuses on the recent development of ML/DL methods to provide integrative solutions in discovering cancer-related biomarkers, and their utilization in precision medicine.
癌症进展与改变细胞稳态的基因-环境相互作用相关。使用生物标志物作为疾病表现和进展的早期指标可以显著改善诊断和治疗。通过高通量分析技术生成的大型组学数据集,如微阵列、RNA测序、全基因组鸟枪法测序、核磁共振和质谱,已经实现了数据驱动的生物标志物发现。传统上,将差异表达特征鉴定为分子标志物依赖于统计技术,而这些技术通常仅限于线性参数建模。癌基因中观察到的异质性、表观遗传变化和高度多态性需要生物标志物辅助的个性化用药方案。深度学习(DL)是机器学习(ML)的一个主要子领域,近年来越来越多地用于研究各种疾病。将ML/DL方法结合起来以优化跨多组学数据集的性能,产生了强大的集成学习预测模型,这些模型在精准医学中变得越来越有用。本综述重点关注ML/DL方法的最新发展,以提供发现癌症相关生物标志物的综合解决方案,以及它们在精准医学中的应用。