Ledesma Dakila, Symes Steven, Richards Sean
Department of Computer Science and Engineering, University of Tennessee at Chattanooga, Tennessee, TN 37996, United States.
Department of Obstetrics and Gynecology, University of Tennessee College of Medicine, Chattanooga, Tennessee, TN 37996, United States.
Curr Med Chem. 2021;28(32):6512-6531. doi: 10.2174/0929867328666210208111821.
The adoption of biomarkers as part of high-throughput, complex microarray or sequencing data has necessitated the discovery and validation of these data through machine learning. Machine learning has remained a fundamental and indispensable tool due to its efficacy and efficiency in both feature extraction of relevant biomarkers as well as the classification of samples as validation of the discovered biomarkers.
This review aims to present the impact and ability of various machine learning methodologies and models to process high-throughput, high-dimensionality data found within mass spectrometry, microarray, and DNA/RNA-sequence data; data that precluded biomarker discovery prior to the use of machine learning.
A vast array of literature highlighting machine learning for biomarker discovery was reviewed, resulting in the eligibility of 21 machine learning algorithms/networks and 3 combinatory architectures, spanning 17 fields of study. This literature was screened to investigate the usage and development of machine learning within the framework of biomarker discovery.
Out of the 93 papers collected, a total of 62 biomarker studies were further reviewed across different subfields-49 of which employed machine learning algorithms, and 13 of which employed neural network-based models. Through the application, innovation, and creation of tools in biomarker-related machine learning methodologies, its use allowed for the discovery, accumulation, validation, and interpretation of biomarkers within varied data formats, sources, as well as fields of study.
The use of machine learning methodologies for biomarker discovery is critical to the analysis of various types of data used for biomarker discovery, such as mass spectrometry, nucleotide and protein sequencing, and image (e.g. CT-scan) data. Further studies containing more standardized techniques for evaluation, and the use of cutting- edge machine learning architectures may lead to more accurate and specific results.
将生物标志物作为高通量、复杂微阵列或测序数据的一部分加以应用,使得有必要通过机器学习来发现和验证这些数据。机器学习一直是一种基本且不可或缺的工具,因为它在相关生物标志物的特征提取以及样本分类(作为对所发现生物标志物的验证)方面都具有有效性和高效性。
本综述旨在介绍各种机器学习方法和模型处理质谱、微阵列以及DNA/RNA序列数据中所发现的高通量、高维数据的影响和能力;这些数据在机器学习应用之前阻碍了生物标志物的发现。
对大量强调机器学习用于生物标志物发现的文献进行了综述,最终确定了21种机器学习算法/网络和3种组合架构符合要求,涵盖17个研究领域。对这些文献进行筛选,以研究机器学习在生物标志物发现框架内的使用和发展情况。
在收集的93篇论文中,共对62项生物标志物研究在不同子领域进行了进一步综述——其中49项采用了机器学习算法,13项采用了基于神经网络的模型。通过在生物标志物相关机器学习方法中应用、创新和创建工具,其使用使得能够在各种数据格式、来源以及研究领域中发现、积累、验证和解释生物标志物。
使用机器学习方法进行生物标志物发现对于分析用于生物标志物发现的各种类型数据至关重要,例如质谱、核苷酸和蛋白质测序以及图像(如CT扫描)数据。包含更标准化评估技术以及使用前沿机器学习架构的进一步研究可能会带来更准确和特异的结果。