Suppr超能文献

SHEPHARD:一种用于分析和注释大型蛋白质数据集的模块化和可扩展的软件架构。

SHEPHARD: a modular and extensible software architecture for analyzing and annotating large protein datasets.

机构信息

Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, 660 South Euclid Avenue, Saint Louis, MO 63110, United States.

Center for Biomolecular Condensates, Washington University in St. Louis, 1 Brookings Drive, Saint Louis, MO 63130, United States.

出版信息

Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad488.

Abstract

MOTIVATION

The emergence of high-throughput experiments and high-resolution computational predictions has led to an explosion in the quality and volume of protein sequence annotations at proteomic scales. Unfortunately, sanity checking, integrating, and analyzing complex sequence annotations remains logistically challenging and introduces a major barrier to entry for even superficial integrative bioinformatics.

RESULTS

To address this technical burden, we have developed SHEPHARD, a Python framework that trivializes large-scale integrative protein bioinformatics. SHEPHARD combines an object-oriented hierarchical data structure with database-like features, enabling programmatic annotation, integration, and analysis of complex datatypes. Importantly SHEPHARD is easy to use and enables a Pythonic interrogation of largescale protein datasets with millions of unique annotations. We use SHEPHARD to examine three orthogonal proteome-wide questions relating protein sequence to molecular function, illustrating its ability to uncover novel biology.

AVAILABILITY AND IMPLEMENTATION

We provided SHEPHARD as both a stand-alone software package (https://github.com/holehouse-lab/shephard), and as a Google Colab notebook with a collection of precomputed proteome-wide annotations (https://github.com/holehouse-lab/shephard-colab).

摘要

动机

高通量实验和高分辨率计算预测的出现导致了蛋白质序列注释在蛋白质组学规模上的质量和数量呈爆炸式增长。不幸的是,即使是肤浅的综合生物信息学,对这些复杂序列注释进行合理性检查、整合和分析在逻辑上仍然具有挑战性,这引入了一个主要的进入障碍。

结果

为了解决这个技术负担,我们开发了 SHEPHARD,这是一个 Python 框架,它使大规模综合蛋白质生物信息学变得轻而易举。SHEPHARD 将面向对象的层次数据结构与数据库特性相结合,使复杂数据类型的程序式注释、整合和分析成为可能。重要的是,SHEPHARD 易于使用,并且能够以 Pythonic 的方式对具有数百万个独特注释的大规模蛋白质数据集进行查询。我们使用 SHEPHARD 来检查三个与蛋白质序列与分子功能相关的正交蛋白质组学问题,说明了它揭示新生物学的能力。

可用性和实现

我们提供了 SHEPHARD 作为一个独立的软件包(https://github.com/holehouse-lab/shephard),以及一个带有预计算蛋白质组注释集合的 Google Colab 笔记本(https://github.com/holehouse-lab/shephard-colab)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/887a/10423030/14fa8a5d9f9b/btad488f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验