• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于蛋白质结构和功能基因组规模预测的高性能深度学习工具箱。

High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function.

作者信息

Gao Mu, Lund-Andersen Peik, Morehead Alex, Mahmud Sajid, Chen Chen, Chen Xiao, Giri Nabin, Roy Raj S, Quadir Farhan, Effler T Chad, Prout Ryan, Abraham Subil, Elwasif Wael, Haas N Quentin, Skolnick Jeffrey, Cheng Jianlin, Sedova Ada

机构信息

Georgia Institute of Technology, Atlanta, GA.

University of Idaho, Moscow, ID.

出版信息

Workshop Mach Learn HPC Environ. 2021 Nov;2021:46-57. doi: 10.1109/mlhpc54614.2021.00010. Epub 2021 Dec 27.

DOI:10.1109/mlhpc54614.2021.00010
PMID:35112110
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8802329/
Abstract

Computational biology is one of many scientific disciplines ripe for innovation and acceleration with the advent of high-performance computing (HPC). In recent years, the field of machine learning has also seen significant benefits from adopting HPC practices. In this work, we present a novel HPC pipeline that incorporates various machine-learning approaches for structure-based functional annotation of proteins on the scale of whole genomes. Our pipeline makes extensive use of deep learning and provides computational insights into best practices for training advanced deep-learning models for high-throughput data such as proteomics data. We showcase methodologies our pipeline currently supports and detail future tasks for our pipeline to envelop, including large-scale sequence comparison using SAdLSA and prediction of protein tertiary structures using AlphaFold2.

摘要

随着高性能计算(HPC)的出现,计算生物学是众多亟待创新和加速发展的科学学科之一。近年来,机器学习领域也从采用HPC实践中受益匪浅。在这项工作中,我们提出了一种新颖的HPC流程,该流程整合了各种机器学习方法,用于在全基因组规模上对蛋白质进行基于结构的功能注释。我们的流程广泛使用深度学习,并为训练针对蛋白质组学数据等高通量数据的先进深度学习模型的最佳实践提供计算见解。我们展示了我们的流程目前支持的方法,并详细说明了我们的流程未来需要涵盖的任务,包括使用SAdLSA进行大规模序列比较以及使用AlphaFold2预测蛋白质三级结构。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be03/8802329/a70e8e2c3904/nihms-1769610-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be03/8802329/3ce9374ce7f5/nihms-1769610-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be03/8802329/36825eef55eb/nihms-1769610-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be03/8802329/ce596a521a81/nihms-1769610-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be03/8802329/e949e9cf2ffb/nihms-1769610-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be03/8802329/83c8bcdd98e8/nihms-1769610-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be03/8802329/a70e8e2c3904/nihms-1769610-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be03/8802329/3ce9374ce7f5/nihms-1769610-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be03/8802329/36825eef55eb/nihms-1769610-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be03/8802329/ce596a521a81/nihms-1769610-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be03/8802329/e949e9cf2ffb/nihms-1769610-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be03/8802329/83c8bcdd98e8/nihms-1769610-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be03/8802329/a70e8e2c3904/nihms-1769610-f0006.jpg

相似文献

1
High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function.用于蛋白质结构和功能基因组规模预测的高性能深度学习工具箱。
Workshop Mach Learn HPC Environ. 2021 Nov;2021:46-57. doi: 10.1109/mlhpc54614.2021.00010. Epub 2021 Dec 27.
2
A General Framework to Learn Tertiary Structure for Protein Sequence Characterization.用于蛋白质序列特征描述的三级结构学习通用框架。
Front Bioinform. 2021 May;1. doi: 10.3389/fbinf.2021.689960. Epub 2021 May 21.
3
A novel sequence alignment algorithm based on deep learning of the protein folding code.一种基于蛋白质折叠码深度学习的新型序列比对算法。
Bioinformatics. 2021 May 1;37(4):490-496. doi: 10.1093/bioinformatics/btaa810.
4
DeepBIO: an automated and interpretable deep-learning platform for high-throughput biological sequence prediction, functional annotation and visualization analysis.DeepBIO:一个自动化的、可解释的深度学习平台,用于高通量生物序列预测、功能注释和可视化分析。
Nucleic Acids Res. 2023 Apr 24;51(7):3017-3029. doi: 10.1093/nar/gkad055.
5
Accelerating Computational Materials Discovery with Machine Learning and Cloud High-Performance Computing: from Large-Scale Screening to Experimental Validation.利用机器学习和云高性能计算加速计算材料发现:从大规模筛选到实验验证
J Am Chem Soc. 2024 Jul 24;146(29):20009-20018. doi: 10.1021/jacs.4c03849. Epub 2024 Jul 9.
6
Protein Structure Prediction: Conventional and Deep Learning Perspectives.蛋白质结构预测:常规方法与深度学习视角。
Protein J. 2021 Aug;40(4):522-544. doi: 10.1007/s10930-021-10003-y. Epub 2021 May 28.
7
From tradition to innovation: conventional and deep learning frameworks in genome annotation.从传统到创新:基因组注释中的常规和深度学习框架。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae138.
8
Deep learning-driven insights into super protein complexes for outer membrane protein biogenesis in bacteria.深度学习驱动的细菌外膜蛋白生物发生中超蛋白复合物的研究进展
Elife. 2022 Dec 28;11:e82885. doi: 10.7554/eLife.82885.
9
10
A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.一种用于大规模比较原核生物基因组学研究的从头基因组分析流程(DeNoGAP)。
BMC Bioinformatics. 2016 Jun 30;17(1):260. doi: 10.1186/s12859-016-1142-2.

引用本文的文献

1
De novo atomic protein structure modeling for cryoEM density maps using 3D transformer and HMM.利用 3D 转换器和 HMM 对冷冻电镜密度图进行从头原子蛋白结构建模。
Nat Commun. 2024 Jun 29;15(1):5511. doi: 10.1038/s41467-024-49647-6.
2
Cryo2StructData: A Large Labeled Cryo-EM Density Map Dataset for AI-based Modeling of Protein Structures.Cryo2StructData:用于基于 AI 的蛋白质结构建模的大型标记冷冻电镜密度图数据集。
Sci Data. 2024 May 6;11(1):458. doi: 10.1038/s41597-024-03299-9.
3
De Novo Atomic Protein Structure Modeling for Cryo-EM Density Maps Using 3D Transformer and Hidden Markov Model.

本文引用的文献

1
DIPS-Plus: The enhanced database of interacting protein structures for interface prediction.DIPS-Plus:用于界面预测的增强型互作蛋白结构数据库。
Sci Data. 2023 Aug 3;10(1):509. doi: 10.1038/s41597-023-02409-3.
2
Supercomputing Pipelines Search for Therapeutics Against COVID-19.超级计算管道寻找抗 COVID-19 的疗法。
Comput Sci Eng. 2020 Nov 6;23(1):7-16. doi: 10.1109/MCSE.2020.3036540. eCollection 2021 Jan.
3
DISTEMA: distance map-based estimation of single protein model accuracy with attentive 2D convolutional neural network.
使用3D变压器和隐马尔可夫模型对冷冻电镜密度图进行从头原子蛋白质结构建模
bioRxiv. 2024 Jan 2:2024.01.02.573943. doi: 10.1101/2024.01.02.573943.
4
Cryo2StructData: A Large Labeled Cryo-EM Density Map Dataset for AI-based Modeling of Protein Structures.Cryo2StructData:一个用于基于人工智能的蛋白质结构建模的大型带注释冷冻电镜密度图数据集。
bioRxiv. 2024 Jan 2:2023.06.14.545024. doi: 10.1101/2023.06.14.545024.
5
A gated graph transformer for protein complex structure quality assessment and its performance in CASP15.门控图转换器用于蛋白质复合物结构质量评估及其在 CASP15 中的性能。
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i308-i317. doi: 10.1093/bioinformatics/btad203.
6
Multi-head attention-based U-Nets for predicting protein domain boundaries using 1D sequence features and 2D distance maps.基于多头注意力的 U-Net 模型,利用 1D 序列特征和 2D 距离图预测蛋白质结构域边界。
BMC Bioinformatics. 2022 Jul 19;23(1):283. doi: 10.1186/s12859-022-04829-1.
7
AF2Complex predicts direct physical interactions in multimeric proteins with deep learning.AF2Complex 利用深度学习预测多聚体蛋白质中的直接物理相互作用。
Nat Commun. 2022 Apr 1;13(1):1744. doi: 10.1038/s41467-022-29394-2.
DISTEMA:基于距离图的注意力二维卷积神经网络的单蛋白模型精度估计。
BMC Bioinformatics. 2022 Apr 19;23(Suppl 3):141. doi: 10.1186/s12859-022-04683-1.
4
DeepComplex: A Web Server of Predicting Protein Complex Structures by Deep Learning Inter-chain Contact Prediction and Distance-Based Modelling.深度复合物:一个通过深度学习链间接触预测和基于距离的建模来预测蛋白质复合物结构的网络服务器。
Front Mol Biosci. 2021 Aug 23;8:716973. doi: 10.3389/fmolb.2021.716973. eCollection 2021.
5
Assessment of protein model structure accuracy estimation in CASP14: Old and new challenges.评估 CASP14 中蛋白质模型结构准确性估计:新老挑战。
Proteins. 2021 Dec;89(12):1940-1948. doi: 10.1002/prot.26192. Epub 2021 Aug 5.
6
A General Framework to Learn Tertiary Structure for Protein Sequence Characterization.用于蛋白质序列特征描述的三级结构学习通用框架。
Front Bioinform. 2021 May;1. doi: 10.3389/fbinf.2021.689960. Epub 2021 May 21.
7
Highly accurate protein structure prediction for the human proteome.高精准度的人类蛋白质组蛋白结构预测。
Nature. 2021 Aug;596(7873):590-596. doi: 10.1038/s41586-021-03828-1. Epub 2021 Jul 22.
8
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
9
DNCON2_Inter: predicting interchain contacts for homodimeric and homomultimeric protein complexes using multiple sequence alignments of monomers and deep learning.DNCON2_Inter:使用单体的多重序列比对和深度学习预测同源二聚体和同源多聚体蛋白复合物的链间接触。
Sci Rep. 2021 Jun 10;11(1):12295. doi: 10.1038/s41598-021-91827-7.
10
Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14.基于深度学习和残差距离预测的蛋白质模型准确性估计在 CASP14 中的应用。
Sci Rep. 2021 May 25;11(1):10943. doi: 10.1038/s41598-021-90303-6.