Department of Computer Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran.
Department of Computer Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran.
Comput Biol Med. 2023 Aug;162:107024. doi: 10.1016/j.compbiomed.2023.107024. Epub 2023 May 26.
Artificial intelligence-based models and robust computational methods have expedited the data-to-knowledge trajectory in precision medicine. Although machine learning models have been widely applied in medical data analysis, some barriers are yet to be challenging, such as available biosample shortage, prohibitive costs, rare diseases, and ethical considerations. Transcriptomics, an omics approach that studies gene activities and provides gene expression data such as microarray and RNA-Sequences faces the difficulties of biospecimen collection, particularly for mental disorders, as some psychiatric patients avoid medical care. Microarray data suffers from the low number of available samples, making it challenging to apply machine learning models. However, adversarial generative network (GAN), the hottest paradigm in deep learning, has created unprecedented momentum in data augmentation and efficiently expands datasets. This paper proposes a novel model termed MS-ACGAN, where the generator feeds on a bordered Gaussian distribution. In machine learning, calibration is of utmost importance, which gives insight into model uncertainty and is considered a crucial step toward improving the robustness and reliability of models. Therefore, we apply calibration techniques to classifiers and focus on estimating their probabilities as accurately as possible. Additionally, we present our trustworthy outputs by harnessing confidence intervals that confine the point estimate limitations and report a range of expected values for performance metrics. Both concepts statistically describe the implemented model's reliability in this study. Furthermore, we employ two quantitative measures, GAN-train and GAN-test, to demonstrate that the artificial data generated by our robust approach remarkably resembles the original data characteristics.
基于人工智能的模型和强大的计算方法加速了精准医学中的数据到知识的转化。虽然机器学习模型已经广泛应用于医学数据分析,但仍有一些挑战需要克服,例如生物样本的缺乏、高昂的成本、罕见疾病和伦理问题。转录组学是一种研究基因活性并提供基因表达数据的组学方法,如微阵列和 RNA 测序,它面临着生物样本采集的困难,特别是对于精神障碍,因为一些精神病患者回避医疗。微阵列数据面临着可用样本数量少的问题,使得机器学习模型难以应用。然而,对抗生成网络(GAN)是深度学习中最热门的范例,它在数据扩充方面创造了前所未有的动力,并有效地扩展了数据集。本文提出了一种新的模型,称为 MS-ACGAN,其中生成器以带边框的高斯分布为输入。在机器学习中,校准至关重要,它可以深入了解模型的不确定性,并被认为是提高模型稳健性和可靠性的关键步骤。因此,我们将校准技术应用于分类器,并专注于尽可能准确地估计它们的概率。此外,我们通过利用置信区间来呈现我们值得信赖的输出,置信区间限制了点估计的限制,并报告了性能指标的预期值范围。这两个概念在本研究中从统计学上描述了所实现模型的可靠性。此外,我们采用了两个定量指标,GAN-train 和 GAN-test,来证明我们稳健方法生成的人工数据与原始数据特征非常相似。