• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

eHive:一个用于基因组分析的人工智能工作流程系统。

eHive: an artificial intelligence workflow system for genomic analysis.

机构信息

European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK.

出版信息

BMC Bioinformatics. 2010 May 11;11:240. doi: 10.1186/1471-2105-11-240.

DOI:10.1186/1471-2105-11-240
PMID:20459813
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2885371/
Abstract

BACKGROUND

The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future.

RESULTS

We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios.

CONCLUSIONS

eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.

摘要

背景

Ensembl 项目每年会发布数次更新,以提供其比较基因组学资源。在每个发布周期中,大约需要两周的时间来生成所有的基因组比对和蛋白质同源性预测。这项任务所需的计算量大约随物种数量呈二次增长。我们目前在 Ensembl 中支持 50 个物种,预计未来这个数字还会继续增长。

结果

我们提出了 eHive,这是一种新的容错分布式处理系统,最初是基于黑板系统、网络分布式自治代理、数据流图和块分支图来支持比较基因组分析的。在 eHive 系统中,MySQL 数据库充当中央黑板,而自主代理(一个 Perl 脚本)查询系统并根据需要运行作业。该系统允许我们定义数据流和分支规则,以适应我们所有的生产管道。我们描述了三个管道的实现:(1)成对的全基因组比对,(2)多个全基因组比对,(3)带有蛋白质同源性推断的基因树。最后,我们展示了该系统在实际场景中的效率。

结论

eHive 使我们能够以可靠且高效的方式,以最小的监督和高吞吐量生成计算密集型的结果。更多文档请访问:http://www.ensembl.org/info/docs/eHive/。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/781c/2885371/cc37333a8aec/1471-2105-11-240-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/781c/2885371/061ebb767ef5/1471-2105-11-240-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/781c/2885371/31edc31a1cdf/1471-2105-11-240-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/781c/2885371/55f5b94d39a6/1471-2105-11-240-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/781c/2885371/daa102382c3f/1471-2105-11-240-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/781c/2885371/cc37333a8aec/1471-2105-11-240-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/781c/2885371/061ebb767ef5/1471-2105-11-240-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/781c/2885371/31edc31a1cdf/1471-2105-11-240-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/781c/2885371/55f5b94d39a6/1471-2105-11-240-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/781c/2885371/daa102382c3f/1471-2105-11-240-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/781c/2885371/cc37333a8aec/1471-2105-11-240-5.jpg

相似文献

1
eHive: an artificial intelligence workflow system for genomic analysis.eHive:一个用于基因组分析的人工智能工作流程系统。
BMC Bioinformatics. 2010 May 11;11:240. doi: 10.1186/1471-2105-11-240.
2
Ensembl comparative genomics resources.Ensembl比较基因组学资源。
Database (Oxford). 2016 Feb 20;2016. doi: 10.1093/database/bav096. Print 2016.
3
Ensembl Plants: Integrating Tools for Visualizing, Mining, and Analyzing Plant Genomics Data.Ensembl植物数据库:整合用于可视化、挖掘和分析植物基因组学数据的工具。
Methods Mol Biol. 2016;1374:115-40. doi: 10.1007/978-1-4939-3167-5_6.
4
A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.一种用于大规模比较原核生物基因组学研究的从头基因组分析流程(DeNoGAP)。
BMC Bioinformatics. 2016 Jun 30;17(1):260. doi: 10.1186/s12859-016-1142-2.
5
Ensembl 2018.Ensembl 2018.
Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761. doi: 10.1093/nar/gkx1098.
6
Ensembl 2019.Ensembl 2019.
Nucleic Acids Res. 2019 Jan 8;47(D1):D745-D751. doi: 10.1093/nar/gky1113.
7
Ensembl Plants: Integrating Tools for Visualizing, Mining, and Analyzing Plant Genomic Data.Ensembl植物数据库:整合用于可视化、挖掘和分析植物基因组数据的工具。
Methods Mol Biol. 2017;1533:1-31. doi: 10.1007/978-1-4939-6658-5_1.
8
Ensembl 2009.Ensembl 2009.
Nucleic Acids Res. 2009 Jan;37(Database issue):D690-7. doi: 10.1093/nar/gkn828. Epub 2008 Nov 25.
9
Ensembl 2023.Ensembl 2023.
Nucleic Acids Res. 2023 Jan 6;51(D1):D933-D941. doi: 10.1093/nar/gkac958.
10
The Ensembl Web site: mechanics of a genome browser.Ensembl网站:基因组浏览器的运行机制
Genome Res. 2004 May;14(5):951-5. doi: 10.1101/gr.1863004.

引用本文的文献

1
Cloud Computing Enabled Big Multi-Omics Data Analytics.基于云计算的大型多组学数据分析
Bioinform Biol Insights. 2021 Jul 28;15:11779322211035921. doi: 10.1177/11779322211035921. eCollection 2021.
2
Updates to HCOP: the HGNC comparison of orthology predictions tool.更新 HCOP:HGNC 同源预测比较工具
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab155.
3
Ensembl Genomes 2020-enabling non-vertebrate genomic research.Ensembl Genomes 2020——助力非脊椎动物基因组研究。

本文引用的文献

1
BioMart--biological queries made easy.生物集市——轻松进行生物学查询。
BMC Genomics. 2009 Jan 14;10:22. doi: 10.1186/1471-2164-10-22.
2
Ensembl 2009.Ensembl 2009.
Nucleic Acids Res. 2009 Jan;37(Database issue):D690-7. doi: 10.1093/nar/gkn828. Epub 2008 Nov 25.
3
EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates.EnsemblCompara基因树:脊椎动物中完整的、可识别基因复制的系统发育树。
Nucleic Acids Res. 2020 Jan 8;48(D1):D689-D695. doi: 10.1093/nar/gkz890.
4
Ensembl variation resources.Ensembl 变异资源。
Database (Oxford). 2018 Jan 1;2018:bay119. doi: 10.1093/database/bay119.
5
Ensembl 2019.Ensembl 2019.
Nucleic Acids Res. 2019 Jan 8;47(D1):D745-D751. doi: 10.1093/nar/gky1113.
6
Haplosaurus computes protein haplotypes for use in precision drug design.Haplosaurus 计算蛋白质单倍型,用于精准药物设计。
Nat Commun. 2018 Oct 8;9(1):4128. doi: 10.1038/s41467-018-06542-1.
7
Alignment of 1000 Genomes Project reads to reference assembly GRCh38.将 1000 基因组计划的读取与参考组装 GRCh38 对齐。
Gigascience. 2017 Jul 1;6(7):1-8. doi: 10.1093/gigascience/gix038.
8
Ensembl 2017.Ensembl 2017年
Nucleic Acids Res. 2017 Jan 4;45(D1):D635-D642. doi: 10.1093/nar/gkw1104. Epub 2016 Nov 28.
9
WormBase ParaSite - a comprehensive resource for helminth genomics.WormBase ParaSite——一个全面的蠕虫基因组学资源库。
Mol Biochem Parasitol. 2017 Jul;215:2-10. doi: 10.1016/j.molbiopara.2016.11.005. Epub 2016 Nov 27.
10
ncRNA orthologies in the vertebrate lineage.脊椎动物谱系中的非编码RNA直系同源物。
Database (Oxford). 2016 Mar 15;2016. doi: 10.1093/database/bav127. Print 2016.
Genome Res. 2009 Feb;19(2):327-35. doi: 10.1101/gr.073585.107. Epub 2008 Nov 24.
4
Genome-wide nucleotide-level mammalian ancestor reconstruction.全基因组核苷酸水平的哺乳动物祖先重建。
Genome Res. 2008 Nov;18(11):1829-43. doi: 10.1101/gr.076521.108. Epub 2008 Oct 10.
5
Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs.Enredo和Pecan:基于全基因组哺乳动物一致性的旁系同源物多序列比对
Genome Res. 2008 Nov;18(11):1814-28. doi: 10.1101/gr.076554.108. Epub 2008 Oct 10.
6
TreeFam: 2008 Update.树家族:2008年更新版
Nucleic Acids Res. 2008 Jan;36(Database issue):D735-40. doi: 10.1093/nar/gkm1005. Epub 2007 Dec 1.
7
Aligning multiple whole genomes with Mercator and MAVID.使用Mercator和MAVID对多个全基因组进行比对。
Methods Mol Biol. 2007;395:221-36. doi: 10.1007/978-1-59745-514-5_14.
8
Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences.有袋动物家短尾负鼠的基因组揭示了非编码序列的创新。
Nature. 2007 May 10;447(7141):167-77. doi: 10.1038/nature05805.
9
Taverna: a tool for building and running workflows of services.塔弗纳:一种用于构建和运行服务工作流的工具。
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W729-32. doi: 10.1093/nar/gkl320.
10
A genome-wide map of conserved microRNA targets in C. elegans.秀丽隐杆线虫中保守微小RNA靶标的全基因组图谱。
Curr Biol. 2006 Mar 7;16(5):460-71. doi: 10.1016/j.cub.2006.01.050. Epub 2006 Feb 2.