• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用一种利用估算数据的优化策略来优先考虑转录组学和表观基因组学实验。

Prioritizing transcriptomic and epigenomic experiments using an optimization strategy that leverages imputed data.

机构信息

Paul G. Allen School of Computer Science and Engineering.

Department of Electrical and Computer Engineering.

出版信息

Bioinformatics. 2021 May 1;37(4):439-447. doi: 10.1093/bioinformatics/btaa830.

DOI:10.1093/bioinformatics/btaa830
PMID:32966546
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8088321/
Abstract

MOTIVATION

Successful science often involves not only performing experiments well, but also choosing well among many possible experiments. In a hypothesis generation setting, choosing an experiment well means choosing an experiment whose results are interesting or novel. In this work, we formalize this selection procedure in the context of genomics and epigenomics data generation. Specifically, we consider the task faced by a scientific consortium such as the National Institutes of Health ENCODE Consortium, whose goal is to characterize all of the functional elements in the human genome. Given a list of possible cell types or tissue types ('biosamples') and a list of possible high-throughput sequencing assays, where at least one experiment has been performed in each biosample and for each assay, we ask 'Which experiments should ENCODE perform next?'

RESULTS

We demonstrate how to represent this task as a submodular optimization problem, where the goal is to choose a panel of experiments that maximize the facility location function. A key aspect of our approach is that we use imputed data, rather than experimental data, to directly answer the posed question. We find that, across several evaluations, our method chooses a panel of experiments that span a diversity of biochemical activity. Finally, we propose two modifications of the facility location function, including a novel submodular-supermodular function, that allow incorporation of domain knowledge or constraints into the optimization procedure.

AVAILABILITY AND IMPLEMENTATION

Our method is available as a Python package at https://github.com/jmschrei/kiwano and can be installed using the command pip install kiwano. The source code used here and the similarity matrix can be found at http://doi.org/10.5281/zenodo.3708538.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

成功的科学研究不仅需要出色地完成实验,还需要在众多可能的实验中做出明智的选择。在生成假说的背景下,“做出明智的选择”意味着选择一项实验,其结果是有趣或新颖的。在这项工作中,我们将这种选择过程形式化,应用于基因组学和表观基因组学数据生成的背景下。具体来说,我们考虑了 NIH 基因组学 ENCODE 联盟等科学联盟所面临的任务,其目标是描述人类基因组中的所有功能元件。给定一个可能的细胞类型或组织类型(“生物样本”)列表,以及一个可能的高通量测序实验列表,其中至少在每个生物样本和每个实验中都进行了一次实验,我们会问:“ENCODE 接下来应该进行哪些实验?”

结果

我们展示了如何将此任务表示为一个次模优化问题,目标是选择一组实验,以最大化设施位置函数。我们方法的一个关键方面是,我们使用插补数据而不是实验数据直接回答所提出的问题。我们发现,在几次评估中,我们的方法选择了一组实验,这些实验涵盖了多种生化活性。最后,我们提出了设施位置函数的两种修改,包括一种新颖的次模-超模函数,允许将领域知识或约束纳入优化过程。

可用性和实现

我们的方法可以作为 Python 包在 https://github.com/jmschrei/kiwano 上获得,并可以使用命令 pip install kiwano 进行安装。这里使用的源代码和相似性矩阵可以在 http://doi.org/10.5281/zenodo.3708538 找到。

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/928f/8088321/4a1b43f433e1/btaa830f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/928f/8088321/2cf3aba5f2ed/btaa830f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/928f/8088321/a78e365f81f3/btaa830f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/928f/8088321/6b6affb96a18/btaa830f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/928f/8088321/e448d9b21970/btaa830f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/928f/8088321/4a1b43f433e1/btaa830f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/928f/8088321/2cf3aba5f2ed/btaa830f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/928f/8088321/a78e365f81f3/btaa830f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/928f/8088321/6b6affb96a18/btaa830f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/928f/8088321/e448d9b21970/btaa830f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/928f/8088321/4a1b43f433e1/btaa830f5.jpg

相似文献

1
Prioritizing transcriptomic and epigenomic experiments using an optimization strategy that leverages imputed data.利用一种利用估算数据的优化策略来优先考虑转录组学和表观基因组学实验。
Bioinformatics. 2021 May 1;37(4):439-447. doi: 10.1093/bioinformatics/btaa830.
2
Choosing panels of genomics assays using submodular optimization.使用次模优化选择基因组学检测面板。
Genome Biol. 2016 Nov 15;17(1):229. doi: 10.1186/s13059-016-1089-7.
3
Goldilocks: a tool for identifying genomic regions that are 'just right'.金发姑娘:一种用于识别“恰到好处”的基因组区域的工具。
Bioinformatics. 2016 Jul 1;32(13):2047-9. doi: 10.1093/bioinformatics/btw116. Epub 2016 Mar 7.
4
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
5
snakePipes: facilitating flexible, scalable and integrative epigenomic analysis.snakePipes:实现灵活、可扩展和集成的表观基因组分析。
Bioinformatics. 2019 Nov 1;35(22):4757-4759. doi: 10.1093/bioinformatics/btz436.
6
EpiCompare: an online tool to define and explore genomic regions with tissue or cell type-specific epigenomic features.EpiCompare:一个在线工具,用于定义和探索具有组织或细胞类型特异性表观基因组特征的基因组区域。
Bioinformatics. 2017 Oct 15;33(20):3268-3275. doi: 10.1093/bioinformatics/btx371.
7
The epiGenomic Efficient Correlator (epiGeEC) tool allows fast comparison of user datasets with thousands of public epigenomic datasets. epiGeEC 工具允许用户快速比较数据集与数千个公共表观基因组数据集。
Bioinformatics. 2019 Feb 15;35(4):674-676. doi: 10.1093/bioinformatics/bty655.
8
scSampler: fast diversity-preserving subsampling of large-scale single-cell transcriptomic data.scSampler:一种用于大规模单细胞转录组数据的快速保多样性的抽样方法。
Bioinformatics. 2022 May 26;38(11):3126-3127. doi: 10.1093/bioinformatics/btac271.
9
GLANET: genomic loci annotation and enrichment tool.GLANET:基因组位点注释和富集工具。
Bioinformatics. 2017 Sep 15;33(18):2818-2828. doi: 10.1093/bioinformatics/btx326.
10
Simulating Illumina metagenomic data with InSilicoSeq.用 InSilicoSeq 模拟 Illumina 宏基因组数据。
Bioinformatics. 2019 Feb 1;35(3):521-522. doi: 10.1093/bioinformatics/bty630.

引用本文的文献

1
Deciphering the Retinal Epigenome during Development, Disease and Reprogramming: Advancements, Challenges and Perspectives.解析发育、疾病和重编程过程中的视网膜表观基因组:进展、挑战与展望。
Cells. 2022 Feb 25;11(5):806. doi: 10.3390/cells11050806.

本文引用的文献

1
Completing the ENCODE3 compendium yields accurate imputations across a variety of assays and human biosamples.完成 ENCODE3 纲要可在各种检测和人类生物样本中实现准确的推断。
Genome Biol. 2020 Mar 30;21(1):82. doi: 10.1186/s13059-020-01978-5.
2
Avocado: a multi-scale deep tensor factorization method learns a latent representation of the human epigenome.鳄梨:一种多尺度深度张量分解方法,可学习人类表观基因组的潜在表示。
Genome Biol. 2020 Mar 30;21(1):81. doi: 10.1186/s13059-020-01977-6.
3
A Genome-wide Framework for Mapping Gene Regulation via Cellular Genetic Screens.
通过细胞遗传筛选进行基因调控的全基因组研究框架
Cell. 2019 Jan 10;176(1-2):377-390.e19. doi: 10.1016/j.cell.2018.11.029. Epub 2019 Jan 3.
4
PREDICTD PaRallel Epigenomics Data Imputation with Cloud-based Tensor Decomposition.基于云的张量分解预测并行表观基因组学数据插补。
Nat Commun. 2018 Apr 11;9(1):1402. doi: 10.1038/s41467-018-03635-9.
5
Choosing non-redundant representative subsets of protein sequence data sets using submodular optimization.使用次模优化选择蛋白质序列数据集的非冗余代表性子集。
Proteins. 2018 Apr;86(4):454-466. doi: 10.1002/prot.25461. Epub 2018 Feb 1.
6
Choosing panels of genomics assays using submodular optimization.使用次模优化选择基因组学检测面板。
Genome Biol. 2016 Nov 15;17(1):229. doi: 10.1186/s13059-016-1089-7.
7
Integrative analysis of 111 reference human epigenomes.111 个人类参考基因组的综合分析。
Nature. 2015 Feb 19;518(7539):317-30. doi: 10.1038/nature14248.
8
Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues.用于多种人类组织系统注释的表观基因组数据集的大规模插补
Nat Biotechnol. 2015 Apr;33(4):364-76. doi: 10.1038/nbt.3157. Epub 2015 Feb 18.
9
An integrated encyclopedia of DNA elements in the human genome.人类基因组中 DNA 元件的综合百科全书。
Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247.
10
Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.ENCODE试点项目对人类基因组1%的功能元件进行鉴定与分析。
Nature. 2007 Jun 14;447(7146):799-816. doi: 10.1038/nature05874.