• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

探索蛋白质同源性的非线性几何学。

Exploring the nonlinear geometry of protein homology.

作者信息

Farnum Michael A, Xu Huafeng, Agrafiotis Dimitris K

机构信息

3-Dimensional Pharmaceuticals Inc., 665 Stockton Drive, Exton, PA 19341, USA.

出版信息

Protein Sci. 2003 Aug;12(8):1604-12. doi: 10.1110/ps.0379403.

DOI:10.1110/ps.0379403
PMID:12876310
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2323947/
Abstract

The explosion of biological data resulting from genomic and proteomic research has created a pressing need for data analysis techniques that work effectively on a large scale. An area of particular interest is the organization and visualization of large families of protein sequences. An increasingly popular approach is to embed the sequences into a low-dimensional Euclidean space in a way that preserves some predefined measure of sequence similarity. This method has been shown to produce maps that exhibit global order and continuity and reveal important evolutionary, structural, and functional relationships between the embedded proteins. However, protein sequences are related by evolutionary pathways that exhibit highly nonlinear geometry, which is invisible to classical embedding procedures such as multidimensional scaling (MDS) and nonlinear mapping (NLM). Here, we describe the use of stochastic proximity embedding (SPE) for producing Euclidean maps that preserve the intrinsic dimensionality and metric structure of the data. SPE extends previous approaches in two important ways: (1) It preserves only local relationships between closely related sequences, thus allowing the map to unfold and reveal its intrinsic dimension, and (2) it scales linearly with the number of sequences and therefore can be applied to very large protein families. The merits of the algorithm are illustrated using examples from the protein kinase and nuclear hormone receptor superfamilies.

摘要

基因组学和蛋白质组学研究产生的生物数据爆炸式增长,迫切需要能有效处理大规模数据的分析技术。一个特别受关注的领域是大量蛋白质序列家族的组织和可视化。一种越来越流行的方法是将序列嵌入低维欧几里得空间,同时保留某种预定义的序列相似性度量。已证明该方法能生成展现全局秩序和连续性的图谱,并揭示嵌入蛋白质之间重要的进化、结构和功能关系。然而,蛋白质序列通过具有高度非线性几何特征的进化途径相互关联,而这对于诸如多维缩放(MDS)和非线性映射(NLM)等经典嵌入程序来说是不可见的。在此,我们描述了使用随机邻近嵌入(SPE)来生成保留数据内在维度和度量结构的欧几里得图谱。SPE在两个重要方面扩展了先前的方法:(1)它仅保留密切相关序列之间的局部关系,从而使图谱能够展开并揭示其内在维度;(2)它与序列数量呈线性比例关系,因此可应用于非常大的蛋白质家族。通过蛋白质激酶和核激素受体超家族的实例说明了该算法的优点。

相似文献

1
Exploring the nonlinear geometry of protein homology.探索蛋白质同源性的非线性几何学。
Protein Sci. 2003 Aug;12(8):1604-12. doi: 10.1110/ps.0379403.
2
A geodesic framework for analyzing molecular similarities.一种用于分析分子相似性的测地线框架。
J Chem Inf Comput Sci. 2003 Mar-Apr;43(2):475-84. doi: 10.1021/ci025631m.
3
Stochastic proximity embedding.随机近似嵌入
J Comput Chem. 2003 Jul 30;24(10):1215-21. doi: 10.1002/jcc.10234.
4
Automatic classification of protein structures using low-dimensional structure space mappings.利用低维结构空间映射对蛋白质结构进行自动分类。
BMC Bioinformatics. 2014;15 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2105-15-S2-S1. Epub 2014 Jan 24.
5
A modified update rule for stochastic proximity embedding.一种用于随机近邻嵌入的改进更新规则。
J Mol Graph Model. 2003 Nov;22(2):133-40. doi: 10.1016/S1093-3263(03)00155-4.
6
Incorporating homologues into sequence embeddings for protein analysis.将同源物纳入用于蛋白质分析的序列嵌入中。
J Bioinform Comput Biol. 2007 Jun;5(3):717-38. doi: 10.1142/s0219720007002734.
7
Application of Kohonen maps for solving the classification puzzle in AGC kinase protein sequences.Kohonen 图谱在解决 AGC 激酶蛋白序列分类难题中的应用。
Interdiscip Sci. 2009 Sep;1(3):173-8. doi: 10.1007/s12539-009-0032-1. Epub 2009 Aug 7.
8
BioGPS: navigating biological space to predict polypharmacology, off-targeting, and selectivity.BioGPS:在生物空间中导航以预测多药理学、脱靶和选择性。
Proteins. 2015 Mar;83(3):517-32. doi: 10.1002/prot.24753. Epub 2015 Jan 24.
9
A Data-Driven Evolutionary Algorithm for Mapping Multibasin Protein Energy Landscapes.一种用于绘制多盆地蛋白质能量景观的数据驱动进化算法。
J Comput Biol. 2015 Sep;22(9):844-60. doi: 10.1089/cmb.2015.0107. Epub 2015 Jul 23.
10
Computing energy landscape maps and structural excursions of proteins.计算蛋白质的能量景观图和结构偏移
BMC Genomics. 2016 Aug 18;17 Suppl 4(Suppl 4):546. doi: 10.1186/s12864-016-2798-8.

引用本文的文献

1
Molecular evolution of phosphoprotein phosphatases in Drosophila.果蝇中磷酸蛋白磷酸酶的分子进化。
PLoS One. 2011;6(7):e22218. doi: 10.1371/journal.pone.0022218. Epub 2011 Jul 15.

本文引用的文献

1
The protein kinase complement of the human genome.人类基因组的蛋白激酶补体。
Science. 2002 Dec 6;298(5600):1912-34. doi: 10.1126/science.1075762.
2
A self-organizing principle for learning nonlinear manifolds.一种用于学习非线性流形的自组织原理。
Proc Natl Acad Sci U S A. 2002 Dec 10;99(25):15869-72. doi: 10.1073/pnas.242424399. Epub 2002 Nov 20.
3
Euclidian space and grouping of biological objects.欧几里得空间与生物对象的分组
Bioinformatics. 2002 Nov;18(11):1523-34. doi: 10.1093/bioinformatics/18.11.1523.
4
Human members of the eukaryotic protein kinase family.真核生物蛋白激酶家族的人类成员。
Genome Biol. 2002 Aug 22;3(9):RESEARCH0043. doi: 10.1186/gb-2002-3-9-research0043.
5
The Pfam protein families database.Pfam蛋白质家族数据库。
Nucleic Acids Res. 2002 Jan 1;30(1):276-80. doi: 10.1093/nar/30.1.276.
6
The PROSITE database, its status in 2002.PROSITE数据库及其2002年的状况。
Nucleic Acids Res. 2002 Jan 1;30(1):235-8. doi: 10.1093/nar/30.1.235.
7
BioLayout--an automatic graph layout algorithm for similarity visualization.BioLayout——一种用于相似性可视化的自动图形布局算法。
Bioinformatics. 2001 Sep;17(9):853-4. doi: 10.1093/bioinformatics/17.9.853.
8
MetaFam: a unified classification of protein families. I. Overview and statistics.MetaFam:蛋白质家族的统一分类。I. 概述与统计
Bioinformatics. 2001 Mar;17(3):249-61. doi: 10.1093/bioinformatics/17.3.249.
9
Nonlinear dimensionality reduction by locally linear embedding.通过局部线性嵌入进行非线性降维
Science. 2000 Dec 22;290(5500):2323-6. doi: 10.1126/science.290.5500.2323.
10
A global geometric framework for nonlinear dimensionality reduction.一种用于非线性降维的全局几何框架。
Science. 2000 Dec 22;290(5500):2319-23. doi: 10.1126/science.290.5500.2319.