LRFragLib：一种用于从头蛋白质结构预测的片段识别有效算法。

LRFragLib: an effective algorithm to identify fragments for de novo protein structure prediction.

作者信息

Wang Tong, Yang Yuedong, Zhou Yaoqi, Gong Haipeng

机构信息

MOE Key Laboratory of Bioinformatics, School of Life Sciences.

Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing 100084, China.

出版信息

Bioinformatics. 2017 Mar 1;33(5):677-684. doi: 10.1093/bioinformatics/btw668.

DOI:10.1093/bioinformatics/btw668

PMID:27797773

Abstract

MOTIVATION

The quality of fragment library determines the efficiency of fragment assembly, an approach that is widely used in most de novo protein-structure prediction algorithms. Conventional fragment libraries are constructed mainly based on the identities of amino acids, sometimes facilitated by predicted information including dihedral angles and secondary structures. However, it remains challenging to identify near-native fragment structures with low sequence homology.

RESULTS

We introduce a novel fragment-library-construction algorithm, LRFragLib, to improve the detection of near-native low-homology fragments of 7-10 residues, using a multi-stage, flexible selection protocol. Based on logistic regression scoring models, LRFragLib outperforms existing techniques by achieving a significantly higher precision and a comparable coverage on recent CASP protein sets in sampling near-native structures. The method also has a comparable computational efficiency to the fastest existing techniques with substantially reduced memory usage.

AVAILABILITY AND IMPLEMENTATION

The source code is available for download at http://166.111.152.91/Downloads.html.

CONTACT

hgong@tsinghua.edu.cn.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

片段库的质量决定了片段组装的效率，而片段组装是大多数从头蛋白质结构预测算法中广泛使用的一种方法。传统的片段库主要基于氨基酸的同一性构建，有时借助包括二面角和二级结构在内的预测信息。然而，识别具有低序列同源性的近天然片段结构仍然具有挑战性。

结果

我们引入了一种新颖的片段库构建算法LRFragLib，使用多阶段、灵活的选择协议来改进对7至10个残基的近天然低同源性片段的检测。基于逻辑回归评分模型，LRFragLib在对近期CASP蛋白质集进行近天然结构采样时，通过实现显著更高的精度和相当的覆盖率，优于现有技术。该方法在计算效率上与最快的现有技术相当，同时大幅减少了内存使用。