Suppr超能文献

自适应 GDDA-BLAST:用于蛋白质序列嵌入的快速有效的算法。

Adaptive GDDA-BLAST: fast and efficient algorithm for protein sequence embedding.

机构信息

Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania, United States of America.

出版信息

PLoS One. 2010 Oct 22;5(10):e13596. doi: 10.1371/journal.pone.0013596.

Abstract

A major computational challenge in the genomic era is annotating structure/function to the vast quantities of sequence information that is now available. This problem is illustrated by the fact that most proteins lack comprehensive annotations, even when experimental evidence exists. We previously theorized that embedded-alignment profiles (simply "alignment profiles" hereafter) provide a quantitative method that is capable of relating the structural and functional properties of proteins, as well as their evolutionary relationships. A key feature of alignment profiles lies in the interoperability of data format (e.g., alignment information, physio-chemical information, genomic information, etc.). Indeed, we have demonstrated that the Position Specific Scoring Matrices (PSSMs) are an informative M-dimension that is scored by quantitatively measuring the embedded or unmodified sequence alignments. Moreover, the information obtained from these alignments is informative, and remains so even in the "twilight zone" of sequence similarity (<25% identity). Although our previous embedding strategy was powerful, it suffered from contaminating alignments (embedded AND unmodified) and high computational costs. Herein, we describe the logic and algorithmic process for a heuristic embedding strategy named "Adaptive GDDA-BLAST." Adaptive GDDA-BLAST is, on average, up to 19 times faster than, but has similar sensitivity to our previous method. Further, data are provided to demonstrate the benefits of embedded-alignment measurements in terms of detecting structural homology in highly divergent protein sequences and isolating secondary structural elements of transmembrane and ankyrin-repeat domains. Together, these advances allow further exploration of the embedded alignment data space within sufficiently large data sets to eventually induce relevant statistical inferences. We show that sequence embedding could serve as one of the vehicles for measurement of low-identity alignments and for incorporation thereof into high-performance PSSM-based alignment profiles.

摘要

在基因组时代,一个主要的计算挑战是对现在可用的大量序列信息进行结构/功能注释。即使存在实验证据,大多数蛋白质也缺乏全面的注释,这一事实说明了这个问题。我们之前曾提出,嵌入式对齐轮廓(简称“对齐轮廓”)提供了一种定量方法,可以关联蛋白质的结构和功能特性及其进化关系。对齐轮廓的一个关键特征在于数据格式的互操作性(例如,对齐信息、生理化学信息、基因组信息等)。事实上,我们已经证明,位置特异性评分矩阵(PSSM)是一个信息丰富的 M 维,通过定量测量嵌入式或未修改的序列比对来评分。此外,从这些比对中获得的信息是有意义的,即使在序列相似性的“黄昏带”(<25%的同一性)中也是如此。虽然我们之前的嵌入策略很强大,但它受到污染的对齐(嵌入式和未修改的)和高计算成本的影响。在此,我们描述了一种名为“自适应 GDDA-BLAST”的启发式嵌入策略的逻辑和算法过程。自适应 GDDA-BLAST 的平均速度比我们之前的方法快 19 倍,但具有相似的敏感性。此外,还提供了数据来证明嵌入式对齐测量在检测高度差异蛋白质序列中的结构同源性和分离跨膜和锚蛋白重复结构域的二级结构元件方面的优势。总之,这些进展允许在足够大的数据集内进一步探索嵌入式对齐数据空间,以最终得出相关的统计推断。我们表明,序列嵌入可以作为测量低同一性比对的手段之一,并将其纳入基于高性能 PSSM 的对齐轮廓中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8ab/2962639/cfd82b0b0210/pone.0013596.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验