Suppr超能文献

级联检测用于提取局部序列特征;HIV-1 蛋白酶的特异性结果和 Schellman 环的结构功能结果。

Cascade detection for the extraction of localized sequence features; specificity results for HIV-1 protease and structure-function results for the Schellman loop.

出版信息

Bioinformatics. 2011 Dec 15;27(24):3415-22. doi: 10.1093/bioinformatics/btr594. Epub 2011 Oct 28.

Abstract

MOTIVATION

The extraction of the set of features most relevant to function from classified biological sequence sets is still a challenging problem. A central issue is the determination of expected counts for higher order features so that artifact features may be screened.

RESULTS

Cascade detection (CD), a new algorithm for the extraction of localized features from sequence sets, is introduced. CD is a natural extension of the proportional modeling techniques used in contingency table analysis into the domain of feature detection. The algorithm is successfully tested on synthetic data and then applied to feature detection problems from two different domains to demonstrate its broad utility. An analysis of HIV-1 protease specificity reveals patterns of strong first-order features that group hydrophobic residues by side chain geometry and exhibit substantial symmetry about the cleavage site. Higher order results suggest that favorable cooperativity is weak by comparison and broadly distributed, but indicate possible synergies between negative charge and hydrophobicity in the substrate. Structure-function results for the Schellman loop, a helix-capping motif in proteins, contain strong first-order features and also show statistically significant cooperativities that provide new insights into the design of the motif. These include a new 'hydrophobic staple' and multiple amphipathic and electrostatic pair features. CD should prove useful not only for sequence analysis, but also for the detection of multifactor synergies in cross-classified data from clinical studies or other sources.

AVAILABILITY

Windows XP/7 application and data files available at: https://sites.google.com/site/cascadedetect/home.

CONTACT

nacnewell@comcast.net

SUPPLEMENTARY INFORMATION

Supplementary information is available at Bioinformatics online.

摘要

动机

从分类的生物序列集中提取与功能最相关的特征集仍然是一个具有挑战性的问题。一个核心问题是确定更高阶特征的预期计数,以便筛选出伪特征。

结果

介绍了一种用于从序列集中提取局部特征的新算法——级联检测(CD)。CD 是在 contingency table 分析中使用的比例建模技术在特征检测领域的自然扩展。该算法在合成数据上成功进行了测试,然后应用于来自两个不同领域的特征检测问题,以证明其广泛的实用性。对 HIV-1 蛋白酶特异性的分析揭示了强一阶特征的模式,这些特征根据侧链几何形状将疏水性残基分组,并在切割位点附近表现出显著的对称性。高阶结果表明,有利的协同作用较弱且广泛分布,但表明底物中负电荷和疏水性之间可能存在协同作用。蛋白质中螺旋帽模体 Schellman 环的结构-功能结果包含强一阶特征,并且还显示出具有统计学意义的协同作用,为该模体的设计提供了新的见解。其中包括一个新的“疏水性钉”和多个两亲性和静电对特征。CD 不仅对序列分析有用,而且对来自临床研究或其他来源的交叉分类数据中多因素协同作用的检测也很有用。

可用性

可在 https://sites.google.com/site/cascadedetect/home 获得适用于 Windows XP/7 的应用程序和数据文件。

联系方式

nacnewell@comcast.net

补充信息

补充信息可在 Bioinformatics 在线获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验