Suppr超能文献

通过神经网络识别假定的结构域连接子——应用于大型序列数据库

Identification of putative domain linkers by a neural network - application to a large sequence database.

作者信息

Miyazaki Satoshi, Kuroda Yutaka, Yokoyama Shigeyuki

机构信息

Department of Biophysics and Biochemistry, Graduate School of Science, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan.

出版信息

BMC Bioinformatics. 2006 Jun 27;7:323. doi: 10.1186/1471-2105-7-323.

Abstract

BACKGROUND

The reliable dissection of large proteins into structural domains represents an important issue for structural genomics/proteomics projects. To provide a practical approach to this issue, we tested the ability of neural network to identify domain linkers from the SWISSPROT database (101602 sequences).

RESULTS

Our search detected 3009 putative domain linkers adjacent to or overlapping with domains, as defined by sequence similarity to either Protein Data Bank (PDB) or Conserved Domain Database (CDD) sequences. Among these putative linkers, 75% were "correctly" located within 20 residues of a domain terminus, and the remaining 25% were found in the middle of a domain, and probably represented failed predictions. Moreover, our neural network predicted 5124 putative domain linkers in structurally un-annotated regions without sequence similarity to PDB or CDD sequences, which suggest to the possible existence of novel structural domains. As a comparison, we performed the same analysis by identifying low-complexity regions (LCR), which are known to encode unstructured polypeptide segments, and observed that the fraction of LCRs that correlate with domain termini is similar to that of domain linkers. However, domain linkers and LCRs appeared to identify different types of domain boundary regions, as only 32% of the putative domain linkers overlapped with LCRs.

CONCLUSION

Overall, our study indicates that the two methods detect independent and complementary regions, and that the combination of these methods can substantially improve the sensitivity of the domain boundary prediction. This finding should enable the identification of novel structural domains, yielding new targets for large scale protein analyses.

摘要

背景

将大蛋白可靠地分解为结构域是结构基因组学/蛋白质组学项目中的一个重要问题。为了提供解决这个问题的实用方法,我们测试了神经网络从SWISSPROT数据库(101602个序列)中识别结构域连接子的能力。

结果

我们的搜索检测到3009个与结构域相邻或重叠的假定结构域连接子,这些结构域由与蛋白质数据库(PDB)或保守结构域数据库(CDD)序列的序列相似性定义。在这些假定的连接子中,75%“正确”地位于结构域末端的20个残基内,其余25%位于结构域中间,可能代表预测失败。此外,我们的神经网络在与PDB或CDD序列无序列相似性的结构未注释区域中预测了5124个假定的结构域连接子,这表明可能存在新的结构域。作为比较,我们通过识别已知编码非结构化多肽片段的低复杂性区域(LCR)进行了相同的分析,观察到与结构域末端相关的LCR比例与结构域连接子的比例相似。然而,结构域连接子和LCR似乎识别不同类型的结构域边界区域,因为只有32%的假定结构域连接子与LCR重叠。

结论

总体而言,我们的研究表明这两种方法检测的是独立且互补的区域,并且这些方法的组合可以显著提高结构域边界预测的灵敏度。这一发现应该能够识别新的结构域,为大规模蛋白质分析产生新的靶点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e41b/1538634/e4d53a18cfa1/1471-2105-7-323-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验