利用深度学习将 L1000 数据转化为 RNA-seq 数据。

Transforming L1000 profiles to RNA-seq-like profiles with deep learning.

机构信息

Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603, New York, NY, 10029, USA.

Department of Medicine, Korea University College of Medicine, Seoul, Republic of Korea.

出版信息

BMC Bioinformatics. 2022 Sep 13;23(1):374. doi: 10.1186/s12859-022-04895-5.

DOI:10.1186/s12859-022-04895-5

PMID:36100892

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9472394/

Abstract

The L1000 technology, a cost-effective high-throughput transcriptomics technology, has been applied to profile a collection of human cell lines for their gene expression response to > 30,000 chemical and genetic perturbations. In total, there are currently over 3 million available L1000 profiles. Such a dataset is invaluable for the discovery of drug and target candidates and for inferring mechanisms of action for small molecules. The L1000 assay only measures the mRNA expression of 978 landmark genes while 11,350 additional genes are computationally reliably inferred. The lack of full genome coverage limits knowledge discovery for half of the human protein coding genes, and the potential for integration with other transcriptomics profiling data. Here we present a Deep Learning two-step model that transforms L1000 profiles to RNA-seq-like profiles. The input to the model are the measured 978 landmark genes while the output is a vector of 23,614 RNA-seq-like gene expression profiles. The model first transforms the landmark genes into RNA-seq-like 978 gene profiles using a modified CycleGAN model applied to unpaired data. The transformed 978 RNA-seq-like landmark genes are then extrapolated into the full genome space with a fully connected neural network model. The two-step model achieves 0.914 Pearson's correlation coefficients and 1.167 root mean square errors when tested on a published paired L1000/RNA-seq dataset produced by the LINCS and GTEx programs. The processed RNA-seq-like profiles are made available for download, signature search, and gene centric reverse search with unique case studies.

摘要

L1000 技术是一种具有成本效益的高通量转录组学技术，已被用于分析一系列人类细胞系对超过 30000 种化学和遗传扰动的基因表达反应。目前，总共有超过 300 万个可用的 L1000 图谱。对于发现药物和靶标候选物以及推断小分子的作用机制，这样的数据集是非常宝贵的。L1000 测定法仅测量 978 个标志性基因的 mRNA 表达，而另外 11350 个基因则通过计算可靠地推断。全基因组覆盖的缺乏限制了人类蛋白编码基因的一半的知识发现，以及与其他转录组学分析数据整合的潜力。在这里，我们提出了一个深度学习两步模型，将 L1000 图谱转换为 RNA-seq 样图谱。该模型的输入是测量的 978 个标志性基因，而输出是一个 23614 个 RNA-seq 样基因表达图谱的向量。该模型首先使用应用于非配对数据的修改后的 CycleGAN 模型将标志性基因转换为 RNA-seq 样的 978 个基因图谱。然后，通过全连接神经网络模型将转换后的 978 个 RNA-seq 样标志性基因外推到全基因组空间。在经过 LINCS 和 GTEx 项目生成的已发表的配对 L1000/RNA-seq 数据集上进行测试时，两步模型达到了 0.914 的皮尔逊相关系数和 1.167 的均方根误差。处理后的 RNA-seq 样图谱可用于下载、签名搜索和以独特的案例研究为中心的基因反向搜索。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/763d/9472394/3d25c4e67b11/12859_2022_4895_Fig1_HTML.jpg

相似文献

Transforming L1000 profiles to RNA-seq-like profiles with deep learning.利用深度学习将 L1000 数据转化为 RNA-seq 数据。

BMC Bioinformatics. 2022 Sep 13;23(1):374. doi: 10.1186/s12859-022-04895-5.

Mining influential genes based on deep learning.基于深度学习的影响基因挖掘。

BMC Bioinformatics. 2021 Jan 22;22(1):27. doi: 10.1186/s12859-021-03972-5.

L1000CDS: LINCS L1000 characteristic direction signatures search engine.L1000CDS：连通性图谱L1000特征方向签名搜索引擎。

NPJ Syst Biol Appl. 2016;2:16015-. doi: 10.1038/npjsba.2016.15. Epub 2016 Aug 4.

Compound signature detection on LINCS L1000 big data.基于LINCS L1000大数据的复合特征检测

Mol Biosyst. 2015 Mar;11(3):714-22. doi: 10.1039/c4mb00677a. Epub 2015 Jan 22.

DOSE-L1000: unveiling the intricate landscape of compound-induced transcriptional changes.DOSE-L1000：揭示化合物诱导的转录变化的复杂格局。

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad683.

Gene expression inference with deep learning.基于深度学习的基因表达推断

Bioinformatics. 2016 Jun 15;32(12):1832-9. doi: 10.1093/bioinformatics/btw074. Epub 2016 Feb 11.

l1kdeconv: an R package for peak calling analysis with LINCS L1000 data.l1kdeconv：一个用于使用LINCS L1000数据进行峰值检测分析的R软件包。

BMC Bioinformatics. 2017 Jul 27;18(1):356. doi: 10.1186/s12859-017-1767-9.

Systematic Quality Control Analysis of LINCS Data.LINCS数据的系统质量控制分析

CPT Pharmacometrics Syst Pharmacol. 2016 Nov;5(11):588-598. doi: 10.1002/psp4.12107. Epub 2016 Oct 31.

A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.下一代连接图谱：L1000平台及首批100万个图谱

Cell. 2017 Nov 30;171(6):1437-1452.e17. doi: 10.1016/j.cell.2017.10.049.

Getting Started with LINCS Datasets and Tools.LINC 数据集和工具入门。

Curr Protoc. 2022 Jul;2(7):e487. doi: 10.1002/cpz1.487.

引用本文的文献

Gene expression inference based on graph neural networks using L1000 data.基于使用L1000数据的图神经网络的基因表达推断

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf273.

Integrative transcriptome-based drug repurposing in tuberculosis.基于整合转录组学的结核病药物再利用研究

bioRxiv. 2025 Jun 2:2025.06.02.657296. doi: 10.1101/2025.06.02.657296.

L2S2: chemical perturbation and CRISPR KO LINCS L1000 signature search engine.L2S2：化学扰动与CRISPR基因敲除的LINCS L1000特征搜索引擎

Nucleic Acids Res. 2025 Jul 7;53(W1):W338-W350. doi: 10.1093/nar/gkaf373.

Playbook workflow builder: Interactive construction of bioinformatics workflows.剧本工作流程构建器：生物信息学工作流程的交互式构建

PLoS Comput Biol. 2025 Apr 3;21(4):e1012901. doi: 10.1371/journal.pcbi.1012901. eCollection 2025 Apr.

Gene expression profiles of precursor cells identify compounds that reduce NRP1 surface expression in macrophages: Implication for drug repositioning for COVID-19.前体细胞的基因表达谱可鉴定出降低巨噬细胞中神经纤毛蛋白1（NRP1）表面表达的化合物：对COVID-19药物重新定位的启示。

Front Cardiovasc Med. 2024 Oct 24;11:1438396. doi: 10.3389/fcvm.2024.1438396. eCollection 2024.

Precious2GPT: the combination of multiomics pretrained transformer and conditional diffusion for artificial multi-omics multi-species multi-tissue sample generation.Precious2GPT：用于人工多组学多物种多组织样本生成的多组学预训练变压器与条件扩散的结合

NPJ Aging. 2024 Aug 8;10(1):37. doi: 10.1038/s41514-024-00163-3.

HE2Gene: image-to-RNA translation via multi-task learning for spatial transcriptomics data.HE2 基因：通过多任务学习进行空间转录组学数据的图像到 RNA 的翻译。

Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae343.

Text-mining-based feature selection for anticancer drug response prediction.基于文本挖掘的特征选择用于抗癌药物反应预测。

Bioinform Adv. 2024 Mar 26;4(1):vbae047. doi: 10.1093/bioadv/vbae047. eCollection 2024.

本文引用的文献

SigCom LINCS: data and metadata search engine for a million gene expression signatures.SigCom LINCS：用于百万个基因表达特征的数据集和元数据搜索引擎。

Nucleic Acids Res. 2022 Jul 5;50(W1):W697-W709. doi: 10.1093/nar/gkac328.

Appyters: Turning Jupyter Notebooks into data-driven web apps.Appyters：将Jupyter笔记本转变为数据驱动的网络应用程序。

Patterns (N Y). 2021 Mar 4;2(3):100213. doi: 10.1016/j.patter.2021.100213. eCollection 2021 Mar 12.

Hormone Replacement Therapy and Aging: A Potential Therapeutic Approach for Age-Related Oxidative Stress and Cardiac Remodeling.激素替代疗法与衰老：一种针对与年龄相关的氧化应激和心脏重构的潜在治疗方法。

Oxid Med Cell Longev. 2021 Feb 3;2021:8364297. doi: 10.1155/2021/8364297. eCollection 2021.

sFRP2 Supersedes VEGF as an Age-related Driver of Angiogenesis in Melanoma, Affecting Response to Anti-VEGF Therapy in Older Patients.sFRP2 取代 VEGF 成为黑色素瘤中与年龄相关的血管生成驱动因素，影响老年患者对抗 VEGF 治疗的反应。

Clin Cancer Res. 2020 Nov 1;26(21):5709-5719. doi: 10.1158/1078-0432.CCR-20-0446.

From molecular promise to preclinical results: HDAC inhibitors in the race for healthy aging drugs.从分子承诺到临床前结果：HDAC 抑制剂在寻找健康衰老药物的竞赛中。

EMBO Mol Med. 2019 Sep;11(9):e9854. doi: 10.15252/emmm.201809854. Epub 2019 Aug 1.

Dietary restriction induces posttranscriptional regulation of longevity genes.饮食限制诱导长寿基因的转录后调控。

Life Sci Alliance. 2019 Jun 28;2(4). doi: 10.26508/lsa.201800281. Print 2019 Aug.

CXCL1 promotes the proliferation of neural stem cells by stimulating the generation of reactive oxygen species in APP/PS1 mice.CXCL1 通过刺激 APP/PS1 小鼠中活性氧的产生促进神经干细胞的增殖。

Biochem Biophys Res Commun. 2019 Jul 12;515(1):201-206. doi: 10.1016/j.bbrc.2019.05.130. Epub 2019 May 27.

Adipokines and Aging: Findings From Centenarians and the Very Old.脂肪因子与衰老：来自百岁老人及超高龄者的研究发现

Front Endocrinol (Lausanne). 2019 Mar 14;10:142. doi: 10.3389/fendo.2019.00142. eCollection 2019.

The p53/miRNAs/Ccna2 pathway serves as a novel regulator of cellular senescence: Complement of the canonical p53/p21 pathway.p53/微小RNA/Ccna2通路作为细胞衰老的新型调节因子：经典p53/p21通路的补充。

Aging Cell. 2019 Jun;18(3):e12918. doi: 10.1111/acel.12918. Epub 2019 Mar 7.

Conditional generative adversarial network for gene expression inference.条件生成对抗网络用于基因表达推断。

Bioinformatics. 2018 Sep 1;34(17):i603-i611. doi: 10.1093/bioinformatics/bty563.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用深度学习将 L1000 数据转化为 RNA-seq 数据。

Transforming L1000 profiles to RNA-seq-like profiles with deep learning.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献