School of Mathematics and Statistics, Shandong University, Weihai, 264209, China.
Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China.
BMC Bioinformatics. 2021 Jun 3;22(1):297. doi: 10.1186/s12859-021-04223-3.
Feature extraction of protein sequences is widely used in various research areas related to protein analysis, such as protein similarity analysis and prediction of protein functions or interactions.
In this study, we introduce FEGS (Feature Extraction based on Graphical and Statistical features), a novel feature extraction model of protein sequences, by developing a new technique for graphical representation of protein sequences based on the physicochemical properties of amino acids and effectively employing the statistical features of protein sequences. By fusing the graphical and statistical features, FEGS transforms a protein sequence into a 578-dimensional numerical vector. When FEGS is applied to phylogenetic analysis on five protein sequence data sets, its performance is notably better than all of the other compared methods.
The FEGS method is carefully designed, which is practically powerful for extracting features of protein sequences. The current version of FEGS is developed to be user-friendly and is expected to play a crucial role in the related studies of protein sequence analyses.
蛋白质序列的特征提取在与蛋白质分析相关的各个研究领域中都得到了广泛应用,如蛋白质相似性分析以及蛋白质功能或相互作用的预测。
在这项研究中,我们提出了 FEGS(基于图形和统计特征的特征提取),这是一种新的蛋白质序列特征提取模型,通过开发一种基于氨基酸理化性质的蛋白质序列图形表示新技术,并有效地利用蛋白质序列的统计特征,将蛋白质序列转换为 578 维数值向量。当 FEGS 应用于五个蛋白质序列数据集的系统发育分析时,其性能明显优于所有其他比较方法。
FEGS 方法设计精细,对于提取蛋白质序列的特征具有实际意义。当前版本的 FEGS 旨在实现用户友好,有望在蛋白质序列分析的相关研究中发挥关键作用。