• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 L1000 基因表达数据的深度学习基准测试

Deep Learning Benchmarks on L1000 Gene Expression Data.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):1846-1857. doi: 10.1109/TCBB.2019.2910061. Epub 2020 Dec 8.

DOI:10.1109/TCBB.2019.2910061
PMID:30990190
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6980363/
Abstract

Gene expression data can offer deep, physiological insights beyond the static coding of the genome alone. We believe that realizing this potential requires specialized, high-capacity machine learning methods capable of using underlying biological structure, but the development of such models is hampered by the lack of published benchmark tasks and well characterized baselines. In this work, we establish such benchmarks and baselines by profiling many classifiers against biologically motivated tasks on two curated views of a large, public gene expression dataset (the LINCS corpus) and one privately produced dataset. We provide these two curated views of the public LINCS dataset and our benchmark tasks to enable direct comparisons to future methodological work and help spur deep learning method development on this modality. In addition to profiling a battery of traditional classifiers, including linear models, random forests, decision trees, K nearest neighbor (KNN) classifiers, and feed-forward artificial neural networks (FF-ANNs), we also test a method novel to this data modality: graph convolugtional neural networks (GCNNs), which allow us to incorporate prior biological domain knowledge. We find that GCNNs can be highly performant, with large datasets, whereas FF-ANNs consistently perform well. Non-neural classifiers are dominated by linear models and KNN classifiers.

摘要

基因表达数据提供了比基因组单纯编码更深入的生理见解。我们认为,要实现这一潜力,需要专门的、大容量的机器学习方法,能够利用潜在的生物学结构,但这种模型的开发受到缺乏已发表的基准任务和特征良好的基线的阻碍。在这项工作中,我们通过在两个经过精心整理的大型公共基因表达数据集(LINCS 语料库)视图和一个私人制作的数据集上针对具有生物学意义的任务对许多分类器进行分析,建立了这些基准和基线。我们提供这两个经过整理的公共 LINCS 数据集视图和我们的基准任务,以支持与未来方法学工作的直接比较,并有助于推动该模态的深度学习方法的发展。除了分析一系列传统分类器,包括线性模型、随机森林、决策树、K 最近邻 (KNN) 分类器和前馈人工神经网络 (FF-ANN) 之外,我们还测试了一种针对这种数据模式的新方法:图卷积神经网络 (GCNN),它使我们能够结合先验的生物学领域知识。我们发现 GCNN 在处理大型数据集时可以表现出很高的性能,而 FF-ANN 则始终表现良好。非神经分类器由线性模型和 KNN 分类器主导。

相似文献

1
Deep Learning Benchmarks on L1000 Gene Expression Data.基于 L1000 基因表达数据的深度学习基准测试
IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):1846-1857. doi: 10.1109/TCBB.2019.2910061. Epub 2020 Dec 8.
2
Machine learning analysis of gene expression data reveals novel diagnostic and prognostic biomarkers and identifies therapeutic targets for soft tissue sarcomas.基于基因表达数据的机器学习分析揭示了软组织肉瘤的新型诊断和预后生物标志物,并确定了治疗靶点。
PLoS Comput Biol. 2019 Feb 20;15(2):e1006826. doi: 10.1371/journal.pcbi.1006826. eCollection 2019 Feb.
3
Examining the significance of fingerprint-based classifiers.审视基于指纹的分类器的重要性。
BMC Bioinformatics. 2008 Dec 17;9:545. doi: 10.1186/1471-2105-9-545.
4
Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.优化神经网络在医学数据集上的应用:以新生儿呼吸暂停预测为例的研究
Artif Intell Med. 2019 Jul;98:59-76. doi: 10.1016/j.artmed.2019.07.008. Epub 2019 Jul 25.
5
Transforming L1000 profiles to RNA-seq-like profiles with deep learning.利用深度学习将 L1000 数据转化为 RNA-seq 数据。
BMC Bioinformatics. 2022 Sep 13;23(1):374. doi: 10.1186/s12859-022-04895-5.
6
Feature weight estimation for gene selection: a local hyperlinear learning approach.特征权重估计在基因选择中的应用:一种局部超线性学习方法。
BMC Bioinformatics. 2014 Mar 14;15:70. doi: 10.1186/1471-2105-15-70.
7
Graph Structured Neural Networks for Perturbation Biology.用于扰动生物学的图结构神经网络
bioRxiv. 2024 Feb 29:2024.02.28.582164. doi: 10.1101/2024.02.28.582164.
8
Comparative evaluation of set-level techniques in predictive classification of gene expression samples.基于集合水平的技术在基因表达样本预测分类中的比较评估。
BMC Bioinformatics. 2012 Jun 25;13 Suppl 10(Suppl 10):S15. doi: 10.1186/1471-2105-13-S10-S15.
9
Deep Learning for ECG Analysis: Benchmarks and Insights from PTB-XL.深度学习在心电图分析中的应用:来自 PTB-XL 的基准和见解。
IEEE J Biomed Health Inform. 2021 May;25(5):1519-1528. doi: 10.1109/JBHI.2020.3022989. Epub 2021 May 11.
10
Discriminant Projection Shared Dictionary Learning for Classification of Tumors Using Gene Expression Data.基于判别投影共享字典学习的基因表达数据肿瘤分类方法
IEEE/ACM Trans Comput Biol Bioinform. 2021 Jul-Aug;18(4):1464-1473. doi: 10.1109/TCBB.2019.2950209. Epub 2021 Aug 6.

引用本文的文献

1
Gene expression inference based on graph neural networks using L1000 data.基于使用L1000数据的图神经网络的基因表达推断
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf273.
2
Should we really use graph neural networks for transcriptomic prediction?我们真的应该使用图神经网络进行转录组预测吗?
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae027.
3
Dex-Benchmark: datasets and code to evaluate algorithms for transcriptomics data analysis.Dex-Benchmark:用于评估转录组数据分析算法的数据集和代码。
PeerJ. 2023 Nov 8;11:e16351. doi: 10.7717/peerj.16351. eCollection 2023.
4
Cross-modal representation alignment of molecular structure and perturbation-induced transcriptional profiles.跨模态分子结构与扰动诱导转录谱的表示对齐。
Pac Symp Biocomput. 2021;26:273-284.
5
Deep learning of pharmacogenomics resources: moving towards precision oncology.基于药理学基因组学资源的深度学习:迈向精准肿瘤学。
Brief Bioinform. 2020 Dec 1;21(6):2066-2083. doi: 10.1093/bib/bbz144.

本文引用的文献

1
Modeling polypharmacy side effects with graph convolutional networks.基于图卷积网络的药物滥用副作用建模。
Bioinformatics. 2018 Jul 1;34(13):i457-i466. doi: 10.1093/bioinformatics/bty294.
2
Deep Learning and Its Applications in Biomedicine.深度学习及其在生物医学中的应用。
Genomics Proteomics Bioinformatics. 2018 Feb;16(1):17-32. doi: 10.1016/j.gpb.2017.07.003. Epub 2018 Mar 6.
3
Cell-specific prediction and application of drug-induced gene expression profiles.药物诱导基因表达谱的细胞特异性预测及应用
Pac Symp Biocomput. 2018;23:32-43.
4
A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.下一代连接图谱:L1000平台及首批100万个图谱
Cell. 2017 Nov 30;171(6):1437-1452.e17. doi: 10.1016/j.cell.2017.10.049.
5
DEEPre: sequence-based enzyme EC number prediction by deep learning.DEEPre:基于深度学习的酶 EC 号序列预测。
Bioinformatics. 2018 Mar 1;34(5):760-769. doi: 10.1093/bioinformatics/btx680.
6
ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network.ProLanGO:基于循环神经网络的神经机器翻译在蛋白质功能预测中的应用。
Molecules. 2017 Oct 17;22(10):1732. doi: 10.3390/molecules22101732.
7
Using neural networks for reducing the dimensions of single-cell RNA-Seq data.使用神经网络降低单细胞RNA测序数据的维度。
Nucleic Acids Res. 2017 Sep 29;45(17):e156. doi: 10.1093/nar/gkx681.
8
De novo peptide sequencing by deep learning.通过深度学习进行从头肽测序。
Proc Natl Acad Sci U S A. 2017 Aug 1;114(31):8247-8252. doi: 10.1073/pnas.1705691114. Epub 2017 Jul 18.
9
Transcriptomics technologies.转录组学技术
PLoS Comput Biol. 2017 May 18;13(5):e1005457. doi: 10.1371/journal.pcbi.1005457. eCollection 2017 May.
10
Representing high throughput expression profiles via perturbation barcodes reveals compound targets.通过扰动条形码表示高通量表达谱可揭示化合物靶点。
PLoS Comput Biol. 2017 Feb 9;13(2):e1005335. doi: 10.1371/journal.pcbi.1005335. eCollection 2017 Feb.