一种贝叶斯方法，用于在 LINCS L1000 数据上进行准确和稳健的特征检测。

A Bayesian approach to accurate and robust signature detection on LINCS L1000 data.

机构信息

Ph.D. Program in Biology, The Graduate Center, The City University of New York, New York, NY 10016, USA.

Department of Astronomy, Columbia University, New York, NY 10027, USA.

出版信息

Bioinformatics. 2020 May 1;36(9):2787-2795. doi: 10.1093/bioinformatics/btaa064.

DOI:10.1093/bioinformatics/btaa064

PMID:32003771

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7203754/

Abstract

MOTIVATION

LINCS L1000 dataset contains numerous cellular expression data induced by large sets of perturbagens. Although it provides invaluable resources for drug discovery as well as understanding of disease mechanisms, the existing peak deconvolution algorithms cannot recover the accurate expression level of genes in many cases, inducing severe noise in the dataset and limiting its applications in biomedical studies.

RESULTS

Here, we present a novel Bayesian-based peak deconvolution algorithm that gives unbiased likelihood estimations for peak locations and characterize the peaks with probability based z-scores. Based on the above algorithm, we build a pipeline to process raw data from L1000 assay into signatures that represent the features of perturbagen. The performance of the proposed pipeline is evaluated using similarity between the signatures of bio-replicates and the drugs with shared targets, and the results show that signatures derived from our pipeline gives a substantially more reliable and informative representation for perturbagens than existing methods. Thus, the new pipeline may significantly boost the performance of L1000 data in the downstream applications such as drug repurposing, disease modeling and gene function prediction.

AVAILABILITY AND IMPLEMENTATION

The code and the precomputed data for LINCS L1000 Phase II (GSE 70138) are available at https://github.com/njpipeorgan/L1000-bayesian.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

Lincs L1000 数据集包含大量由大量扰动剂诱导的细胞表达数据。虽然它为药物发现以及了解疾病机制提供了宝贵的资源，但现有的峰分解算法在许多情况下无法恢复基因的准确表达水平，从而在数据集中引入了严重的噪声，并限制了其在生物医学研究中的应用。

结果

在这里，我们提出了一种新的基于贝叶斯的峰分解算法，该算法可以为峰位置提供无偏的似然估计，并使用基于概率的 z 分数来描述峰的特征。基于上述算法，我们构建了一个从 L1000 分析中处理原始数据的管道，将其转化为代表扰动剂特征的特征。使用生物重复签名之间的相似性以及具有共享靶标的药物来评估所提出的管道的性能，结果表明，与现有方法相比，我们的管道从 L1000 数据中提取的特征签名提供了更可靠和信息更丰富的表示。因此，新的管道可能会显著提高 L1000 数据在下游应用（如药物重定位、疾病建模和基因功能预测）中的性能。

可用性和实现

Lincs L1000 阶段 II（GSE70138）的代码和预先计算的数据可在 https://github.com/njpipeorgan/L1000-bayesian 上获得。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/471f/7203754/9da6cb14bb05/btaa064f1.jpg

相似文献

A Bayesian approach to accurate and robust signature detection on LINCS L1000 data.一种贝叶斯方法，用于在 LINCS L1000 数据上进行准确和稳健的特征检测。

Bioinformatics. 2020 May 1;36(9):2787-2795. doi: 10.1093/bioinformatics/btaa064.

Compound signature detection on LINCS L1000 big data.基于LINCS L1000大数据的复合特征检测

Mol Biosyst. 2015 Mar;11(3):714-22. doi: 10.1039/c4mb00677a. Epub 2015 Jan 22.

metaLINCS: an R package for meta-level analysis of LINCS L1000 drug signatures using stratified connectivity mapping.metaLINCS：一个用于使用分层连通性映射对LINCS L1000药物特征进行元水平分析的R包。

Bioinform Adv. 2022 Sep 9;2(1):vbac064. doi: 10.1093/bioadv/vbac064. eCollection 2022.

l1kdeconv: an R package for peak calling analysis with LINCS L1000 data.l1kdeconv：一个用于使用LINCS L1000数据进行峰值检测分析的R软件包。

BMC Bioinformatics. 2017 Jul 27;18(1):356. doi: 10.1186/s12859-017-1767-9.

A comprehensive evaluation of connectivity methods for L1000 data.L1000 数据连接方法的综合评估。

Brief Bioinform. 2020 Dec 1;21(6):2194-2205. doi: 10.1093/bib/bbz129.

Drug Signature Detection Based on L1000 Genomic and Proteomic Big Data.基于L1000基因组和蛋白质组大数据的药物特征检测

Methods Mol Biol. 2019;1939:273-286. doi: 10.1007/978-1-4939-9089-4_15.

L1000CDS: LINCS L1000 characteristic direction signatures search engine.L1000CDS：连通性图谱L1000特征方向签名搜索引擎。

NPJ Syst Biol Appl. 2016;2:16015-. doi: 10.1038/npjsba.2016.15. Epub 2016 Aug 4.

SigMat: a classification scheme for gene signature matching.SigMat：一种基因特征匹配的分类方案。

Bioinformatics. 2018 Jul 1;34(13):i547-i554. doi: 10.1093/bioinformatics/bty251.

DRIMC: an improved drug repositioning approach using Bayesian inductive matrix completion.DRIMC：一种基于贝叶斯归纳矩阵补全的改进药物重定位方法。

Bioinformatics. 2020 May 1;36(9):2839-2847. doi: 10.1093/bioinformatics/btaa062.

DOSE-L1000: unveiling the intricate landscape of compound-induced transcriptional changes.DOSE-L1000：揭示化合物诱导的转录变化的复杂格局。

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad683.

引用本文的文献

Development of Drug-Induced Gene Expression Ranking Analysis (DIGERA) and Its Application to Virtual Screening for Poly (ADP-Ribose) Polymerase 1 Inhibitor.药物诱导基因表达排名分析（DIGERA）的开发及其在聚（ADP-核糖）聚合酶1抑制剂虚拟筛选中的应用

Int J Mol Sci. 2024 Dec 30;26(1):224. doi: 10.3390/ijms26010224.

CDCM: a correlation-dependent connectivity map approach to rapidly screen drugs during outbreaks of infectious diseases.CDCM：一种在传染病暴发期间快速筛选药物的基于相关性的连通性图谱方法。

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae659.

Data Valuation with Gradient Similarity.基于梯度相似性的数据评估

ArXiv. 2024 May 13:arXiv:2405.08217v1.

Graph Structured Neural Networks for Perturbation Biology.用于扰动生物学的图结构神经网络

bioRxiv. 2024 Feb 29:2024.02.28.582164. doi: 10.1101/2024.02.28.582164.

Hierarchical multi-omics data integration and modeling predict cell-specific chemical proteomics and drug responses.层次化多组学数据整合和建模预测细胞特异性化学蛋白质组学和药物反应。

Cell Rep Methods. 2023 Apr 17;3(4):100452. doi: 10.1016/j.crmeth.2023.100452. eCollection 2023 Apr 24.

Regulating the cell shift of endothelial cell-like myofibroblasts in pulmonary fibrosis.调控肺纤维化中内皮细胞样肌成纤维细胞的细胞迁移。

Eur Respir J. 2023 Jun 8;61(6). doi: 10.1183/13993003.01799-2022. Print 2023 Jun.

Genomics Proteomics Bioinformatics. 2022 Oct;20(5):1028-1036. doi: 10.1016/j.gpb.2022.09.006. Epub 2022 Sep 29.

Deep learning prediction of chemical-induced dose-dependent and context-specific multiplex phenotype responses and its application to personalized alzheimer's disease drug repurposing.深度学习预测化学诱导的剂量依赖性和上下文特异性多重表型反应及其在个性化阿尔茨海默病药物再利用中的应用。

PLoS Comput Biol. 2022 Aug 11;18(8):e1010367. doi: 10.1371/journal.pcbi.1010367. eCollection 2022 Aug.

Chemical-induced gene expression ranking and its application to pancreatic cancer drug repurposing.化学诱导基因表达排名及其在胰腺癌药物重新利用中的应用。

Patterns (N Y). 2022 Feb 4;3(4):100441. doi: 10.1016/j.patter.2022.100441. eCollection 2022 Apr 8.

Repurposing ibudilast to mitigate Alzheimer's disease by targeting inflammation.重新利用伊布地特通过靶向炎症来缓解阿尔茨海默病。

Brain. 2023 Mar 1;146(3):898-911. doi: 10.1093/brain/awac136.

本文引用的文献

Model-Based Clustering With Data Correction For Removing Artifacts In Gene Expression Data.基于模型的聚类与数据校正以去除基因表达数据中的伪迹

Ann Appl Stat. 2016 Feb;11(4):1998-2026. doi: 10.1214/17-AOAS1051. Epub 2017 Dec 28.

The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices.GCTx 格式和 cmap{Py, R, M, J} 包：用于优化存储和注释密集矩阵集成遍历的资源。

Bioinformatics. 2019 Apr 15;35(8):1427-1429. doi: 10.1093/bioinformatics/bty784.

The Library of Integrated Network-Based Cellular Signatures NIH Program: System-Level Cataloging of Human Cells Response to Perturbations.集成网络细胞特征图谱 NIH 计划库：人类细胞对扰动反应的系统水平编目。

Cell Syst. 2018 Jan 24;6(1):13-24. doi: 10.1016/j.cels.2017.11.001. Epub 2017 Nov 29.

A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.下一代连接图谱：L1000平台及首批100万个图谱

Cell. 2017 Nov 30;171(6):1437-1452.e17. doi: 10.1016/j.cell.2017.10.049.

l1kdeconv: an R package for peak calling analysis with LINCS L1000 data.l1kdeconv：一个用于使用LINCS L1000数据进行峰值检测分析的R软件包。

BMC Bioinformatics. 2017 Jul 27;18(1):356. doi: 10.1186/s12859-017-1767-9.

L1000CDS: LINCS L1000 characteristic direction signatures search engine.L1000CDS：连通性图谱L1000特征方向签名搜索引擎。

NPJ Syst Biol Appl. 2016;2:16015-. doi: 10.1038/npjsba.2016.15. Epub 2016 Aug 4.

Representing high throughput expression profiles via perturbation barcodes reveals compound targets.通过扰动条形码表示高通量表达谱可揭示化合物靶点。

PLoS Comput Biol. 2017 Feb 9;13(2):e1005335. doi: 10.1371/journal.pcbi.1005335. eCollection 2017 Feb.

Drug-induced adverse events prediction with the LINCS L1000 data.利用LINCS L1000数据进行药物诱导不良事件预测

Bioinformatics. 2016 Aug 1;32(15):2338-45. doi: 10.1093/bioinformatics/btw168. Epub 2016 Apr 1.

Compound signature detection on LINCS L1000 big data.基于LINCS L1000大数据的复合特征检测

Mol Biosyst. 2015 Mar;11(3):714-22. doi: 10.1039/c4mb00677a. Epub 2015 Jan 22.

LINCS Canvas Browser: interactive web app to query, browse and interrogate LINCS L1000 gene expression signatures.LINCS Canvas 浏览器：交互式网络应用程序，用于查询、浏览和分析 LINCS L1000 基因表达特征。

Nucleic Acids Res. 2014 Jul;42(Web Server issue):W449-60. doi: 10.1093/nar/gku476. Epub 2014 Jun 6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种贝叶斯方法，用于在 LINCS L1000 数据上进行准确和稳健的特征检测。

A Bayesian approach to accurate and robust signature detection on LINCS L1000 data.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献