Suppr超能文献

基于组学数据的疾病分类的人工神经网络结构和准确性。

Architectures and accuracy of artificial neural network for disease classification from omics data.

机构信息

Department of Internal Medicine, University of New Mexico, Albuquerque, NM, 87131, USA.

Vanderbilt Genetics Institute, Department of Molecular Physiology and Biophysics, Vanderbilt University Medical School, Nashville, TN, 37232, USA.

出版信息

BMC Genomics. 2019 Mar 4;20(1):167. doi: 10.1186/s12864-019-5546-z.

Abstract

BACKGROUND

Deep learning has made tremendous successes in numerous artificial intelligence applications and is unsurprisingly penetrating into various biomedical domains. High-throughput omics data in the form of molecular profile matrices, such as transcriptomes and metabolomes, have long existed as a valuable resource for facilitating diagnosis of patient statuses/stages. It is timely imperative to compare deep learning neural networks against classical machine learning methods in the setting of matrix-formed omics data in terms of classification accuracy and robustness.

RESULTS

Using 37 high throughput omics datasets, covering transcriptomes and metabolomes, we evaluated the classification power of deep learning compared to traditional machine learning methods. Representative deep learning methods, Multi-Layer Perceptrons (MLP) and Convolutional Neural Networks (CNN), were deployed and explored in seeking optimal architectures for the best classification performance. Together with five classical supervised classification methods (Linear Discriminant Analysis, Multinomial Logistic Regression, Naïve Bayes, Random Forest, Support Vector Machine), MLP and CNN were comparatively tested on the 37 datasets to predict disease stages or to discriminate diseased samples from normal samples. MLPs achieved the highest overall accuracy among all methods tested. More thorough analyses revealed that single hidden layer MLPs with ample hidden units outperformed deeper MLPs. Furthermore, MLP was one of the most robust methods against imbalanced class composition and inaccurate class labels.

CONCLUSION

Our results concluded that shallow MLPs (of one or two hidden layers) with ample hidden neurons are sufficient to achieve superior and robust classification performance in exploiting numerical matrix-formed omics data for diagnosis purpose. Specific observations regarding optimal network width, class imbalance tolerance, and inaccurate labeling tolerance will inform future improvement of neural network applications on functional genomics data.

摘要

背景

深度学习在众多人工智能应用中取得了巨大成功,并且正在毫不意外地渗透到各个生物医学领域。以转录组和代谢组等分子谱矩阵形式存在的高通量组学数据长期以来一直是促进患者状态/阶段诊断的有价值资源。在以矩阵形式存在的组学数据中,及时比较深度学习神经网络与经典机器学习方法在分类准确性和稳健性方面的表现是至关重要的。

结果

使用涵盖转录组和代谢组的 37 个高通量组学数据集,我们评估了深度学习与传统机器学习方法相比的分类能力。代表性的深度学习方法,多层感知机(MLP)和卷积神经网络(CNN),被部署并探索了最优架构,以获得最佳分类性能。与五种经典监督分类方法(线性判别分析、多项逻辑回归、朴素贝叶斯、随机森林、支持向量机)一起,我们在 37 个数据集上对 MLP 和 CNN 进行了比较测试,以预测疾病阶段或区分患病样本与正常样本。MLP 在所有测试方法中实现了最高的总体准确性。更深入的分析表明,具有充足隐藏单元的单隐藏层 MLP 优于更深的 MLP。此外,MLP 是对抗不平衡类组成和不准确类标签最稳健的方法之一。

结论

我们的结果表明,浅层 MLP(具有一个或两个隐藏层)具有充足的隐藏神经元,足以在利用数值矩阵形式的组学数据进行诊断目的时实现卓越和稳健的分类性能。关于最优网络宽度、类不平衡容忍度和不准确标签容忍度的具体观察结果将为未来功能基因组学数据上的神经网络应用的改进提供信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6826/6399893/c42d5279d6b7/12864_2019_5546_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验