IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2514-2525. doi: 10.1109/TCBB.2020.2986387. Epub 2021 Dec 8.
Molecular biomarkers are certain molecules or set of molecules that can be of help for diagnosis or prognosis of diseases or disorders. In the past decades, thanks to the advances in high-throughput technologies, a huge amount of molecular 'omics' data, e.g., transcriptomics and proteomics, have been accumulated. The availability of these omics data makes it possible to screen biomarkers for diseases or disorders. Accordingly, a number of computational approaches have been developed to identify biomarkers by exploring the omics data. In this review, we present a comprehensive survey on the recent progress of identification of molecular biomarkers with machine learning approaches. Specifically, we categorize the machine learning approaches into supervised, un-supervised and recommendation approaches, where the biomarkers including single genes, gene sets and small gene networks. In addition, we further discuss potential problems underlying bio-medical data that may pose challenges for machine learning, and provide possible directions for future biomarker identification.
分子生物标志物是指某些分子或分子集合,它们有助于疾病或紊乱的诊断或预后。在过去几十年中,由于高通量技术的进步,已经积累了大量的分子“组学”数据,例如转录组学和蛋白质组学。这些组学数据的可用性使得筛选疾病或紊乱的生物标志物成为可能。因此,已经开发了许多计算方法来通过探索组学数据来识别生物标志物。在这篇综述中,我们全面介绍了使用机器学习方法识别分子生物标志物的最新进展。具体来说,我们将机器学习方法分为监督、无监督和推荐方法,其中生物标志物包括单个基因、基因集和小基因网络。此外,我们进一步讨论了生物医学数据中可能对机器学习构成挑战的潜在问题,并为未来的生物标志物识别提供了可能的方向。