• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用SocialGene创建和利用定制的大规模知识图谱用于比较基因组学和多组学药物发现。

Creating and leveraging bespoke large-scale knowledge graphs for comparative genomics and multi-omics drug discovery with SocialGene.

作者信息

Clark Chase M, Kwan Jason C

机构信息

Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, WI 53705, USA.

出版信息

bioRxiv. 2024 Aug 19:2024.08.16.608329. doi: 10.1101/2024.08.16.608329.

DOI:10.1101/2024.08.16.608329
PMID:39229008
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11370487/
Abstract

The rapid expansion of multi-omics data has transformed biological research, offering unprecedented opportunities to explore complex genomic relationships across diverse organisms. However, the vast volume and heterogeneity of these datasets presents significant challenges for analyses. Here we introduce SocialGene, a comprehensive software suite designed to collect, analyze, and organize multi-omics data into structured knowledge graphs, with the ability to handle small projects to repository-scale analyses. Originally developed to enhance genome mining for natural product drug discovery, SocialGene has been effective across various applications, including functional genomics, evolutionary studies, and systems biology. SocialGene's concerted Python and Nextflow libraries streamline data ingestion, manipulation, aggregation, and analysis, culminating in a custom Neo4j database. The software not only facilitates the exploration of genomic synteny but also provides a foundational knowledge graph supporting the integration of additional diverse datasets and the development of advanced search engines and analyses. This manuscript introduces some of SocialGene's capabilities through brief case studies including targeted genome mining for drug discovery, accelerated searches for similar and distantly related biosynthetic gene clusters in biobank-available organisms, integration of chemical and analytical data, and more. SocialGene is free, open-source, MIT-licensed, designed for adaptability and extension, and available from github.com/socialgene.

摘要

多组学数据的迅速扩展改变了生物学研究,为探索不同生物体之间复杂的基因组关系提供了前所未有的机会。然而,这些数据集的巨大规模和异质性给分析带来了重大挑战。在这里,我们介绍SocialGene,这是一个综合软件套件,旨在将多组学数据收集、分析并组织成结构化的知识图谱,能够处理从小型项目到库规模的分析。SocialGene最初是为加强天然产物药物发现的基因组挖掘而开发的,已在包括功能基因组学、进化研究和系统生物学在内的各种应用中发挥了作用。SocialGene协同的Python和Nextflow库简化了数据摄取、操作、聚合和分析,最终形成一个定制的Neo4j数据库。该软件不仅便于探索基因组共线性,还提供了一个基础知识图谱,支持整合其他不同的数据集以及开发高级搜索引擎和分析工具。本文通过简短的案例研究介绍了SocialGene的一些功能,包括用于药物发现的靶向基因组挖掘、在生物样本库可用生物体中加速搜索相似和远缘相关的生物合成基因簇、化学和分析数据的整合等。SocialGene是免费的、开源的,遵循麻省理工学院许可协议,设计具有适应性和扩展性,可从github.com/socialgene获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6f7/11370487/21f413779961/nihpp-2024.08.16.608329v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6f7/11370487/04670e19104b/nihpp-2024.08.16.608329v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6f7/11370487/f89992df2f6d/nihpp-2024.08.16.608329v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6f7/11370487/3fae5be73421/nihpp-2024.08.16.608329v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6f7/11370487/4686f7dc8a0a/nihpp-2024.08.16.608329v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6f7/11370487/21f413779961/nihpp-2024.08.16.608329v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6f7/11370487/04670e19104b/nihpp-2024.08.16.608329v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6f7/11370487/f89992df2f6d/nihpp-2024.08.16.608329v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6f7/11370487/3fae5be73421/nihpp-2024.08.16.608329v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6f7/11370487/4686f7dc8a0a/nihpp-2024.08.16.608329v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6f7/11370487/21f413779961/nihpp-2024.08.16.608329v1-f0005.jpg

相似文献

1
Creating and leveraging bespoke large-scale knowledge graphs for comparative genomics and multi-omics drug discovery with SocialGene.利用SocialGene创建和利用定制的大规模知识图谱用于比较基因组学和多组学药物发现。
bioRxiv. 2024 Aug 19:2024.08.16.608329. doi: 10.1101/2024.08.16.608329.
2
BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters.BiG-SLiCE:一个高度可扩展的工具,可绘制 120 万个生物合成基因簇的多样性图谱。
Gigascience. 2021 Jan 13;10(1). doi: 10.1093/gigascience/giaa154.
3
ODG: Omics database generator - a tool for generating, querying, and analyzing multi-omics comparative databases to facilitate biological understanding.ODG:组学数据库生成器——一种用于生成、查询和分析多组学比较数据库以促进生物学理解的工具。
BMC Bioinformatics. 2017 Aug 10;18(1):367. doi: 10.1186/s12859-017-1777-7.
4
BGCFlow: systematic pangenome workflow for the analysis of biosynthetic gene clusters across large genomic datasets.BGCFlow:用于分析大型基因组数据集生物合成基因簇的系统泛基因组工作流程。
Nucleic Acids Res. 2024 Jun 10;52(10):5478-5495. doi: 10.1093/nar/gkae314.
5
Efficient dynamic variation graphs.高效动态变化图。
Bioinformatics. 2021 Jan 29;36(21):5139-5144. doi: 10.1093/bioinformatics/btaa640.
6
mosGraphGen: a novel tool to generate multi-omics signaling graphs to facilitate integrative and interpretable graph AI model development.mosGraphGen:一种用于生成多组学信号图以促进集成且可解释的图人工智能模型开发的新型工具。
bioRxiv. 2024 Aug 27:2024.05.15.594360. doi: 10.1101/2024.05.15.594360.
7
neo4jsbml: import systems biology markup language data into the graph database Neo4j.neo4jsbml:将系统生物学标记语言数据导入到图数据库 Neo4j 中。
PeerJ. 2024 Jan 16;12:e16726. doi: 10.7717/peerj.16726. eCollection 2024.
8
Targeted Large-Scale Genome Mining and Candidate Prioritization for Natural Product Discovery.靶向大规模基因组挖掘和候选物优先级排序用于天然产物发现。
Mar Drugs. 2022 Jun 16;20(6):398. doi: 10.3390/md20060398.
9
Genome-Guided Discovery of Natural Products through Multiplexed Low-Coverage Whole-Genome Sequencing of Soil Actinomycetes on Oxford Nanopore Flongle.通过在牛津纳米孔Flongle上对土壤放线菌进行多重低覆盖全基因组测序实现天然产物的基因组导向发现。
mSystems. 2021 Dec 21;6(6):e0102021. doi: 10.1128/mSystems.01020-21. Epub 2021 Nov 23.
10
Enhanced correlation-based linking of biosynthetic gene clusters to their metabolic products through chemical class matching.通过化学分类匹配增强生物合成基因簇与其代谢产物的相关性链接。
Microbiome. 2023 Jan 23;11(1):13. doi: 10.1186/s40168-022-01444-3.

本文引用的文献

1
Empowering natural product science with AI: leveraging multimodal data and knowledge graphs.利用人工智能推动天然产物科学发展:借助多模态数据和知识图谱
Nat Prod Rep. 2025 Apr 16;42(4):654-662. doi: 10.1039/d4np00008k.
2
Discovery of a lagriamide polyketide by integrated genome mining, isotopic labeling, and untargeted metabolomics.通过整合基因组挖掘、同位素标记和非靶向代谢组学发现一种拉格酰胺聚酮化合物。
Chem Sci. 2024 May 7;15(21):8089-8096. doi: 10.1039/d4sc00825a. eCollection 2024 May 29.
3
Many purported pseudogenes in bacterial genomes are bona fide genes.
许多在细菌基因组中被认为是假基因的基因实际上是真正的基因。
BMC Genomics. 2024 Apr 15;25(1):365. doi: 10.1186/s12864-024-10137-0.
4
Fast and accurate protein structure search with Foldseek.使用 Foldseek 进行快速准确的蛋白质结构搜索。
Nat Biotechnol. 2024 Feb;42(2):243-246. doi: 10.1038/s41587-023-01773-0. Epub 2023 May 8.
5
antiSMASH 7.0: new and improved predictions for detection, regulation, chemical structures and visualisation.antiSMASH 7.0:用于检测、调控、化学结构和可视化的全新且改进的预测功能。
Nucleic Acids Res. 2023 Jul 5;51(W1):W46-W50. doi: 10.1093/nar/gkad344.
6
CAGECAT: The CompArative GEne Cluster Analysis Toolbox for rapid search and visualisation of homologous gene clusters.CAGECAT:比较基因簇分析工具箱,用于快速搜索和可视化同源基因簇。
BMC Bioinformatics. 2023 May 3;24(1):181. doi: 10.1186/s12859-023-05311-2.
7
Foldcomp: a library and format for compressing and indexing large protein structure sets.Foldcomp:用于压缩和索引大型蛋白质结构集的库和格式。
Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad153.
8
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
9
Trait biases in microbial reference genomes.微生物参考基因组中的性状偏差。
Sci Data. 2023 Feb 9;10(1):84. doi: 10.1038/s41597-023-01994-7.
10
cblaster: a remote search tool for rapid identification and visualization of homologous gene clusters.cblaster:一种用于快速识别和可视化同源基因簇的远程搜索工具。
Bioinform Adv. 2021 Aug 5;1(1):vbab016. doi: 10.1093/bioadv/vbab016. eCollection 2021.