Suppr超能文献

蛋白质序列的时间序列表示用于相似性比较。

A time series representation of protein sequences for similarity comparison.

机构信息

School of Science, Zhejiang Sci-Tech University, Hangzhou 310018, China.

College of Life Science, Zhejiang Sci-Tech University, Hangzhou 310018, China.

出版信息

J Theor Biol. 2022 Apr 7;538:111039. doi: 10.1016/j.jtbi.2022.111039. Epub 2022 Jan 24.

Abstract

Based on the physicochemical indexes of 20 amino acids and the Hungarian algorithm, each amino acid was mapped into a vector. And, the protein sequence can be represented as time series in eleven-dimensional space. In addition, the DTW algorithm was applied to calculate the distance between two time series to compare the similarities of protein sequences. The validity and accuracy of this method was illustrated by similarity comparison of ND5 proteins of nine species. Furthermore, homology analysis of eleven ACE2 proteins, which included human, Malayan pangolin and six species of bats, confirmed that the human had shorter evolutionary distance from the pangolin than those bats. The phylogenetic tree of spike protein sequences of 36 coronaviruses, which were divided into five groups, Class I, Class II, Class III, SARS-CoVs and COVID-19, was constructed.

摘要

基于 20 种氨基酸的理化指标和匈牙利算法,将每种氨基酸映射到一个向量中。并且,蛋白质序列可以表示为十一维空间中的时间序列。此外,应用 DTW 算法计算两个时间序列之间的距离,以比较蛋白质序列的相似性。通过对 9 种 ND5 蛋白的相似性比较,验证了该方法的有效性和准确性。此外,对包括人类、马来穿山甲和 6 种蝙蝠在内的 11 种 ACE2 蛋白的同源性分析证实,人类与穿山甲的进化距离比与蝙蝠的进化距离更近。构建了 36 种冠状病毒刺突蛋白序列的系统发育树,分为 5 组,分别为 I 类、II 类、III 类、SARS-CoV 和 COVID-19。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验