Huang Hsin-Hsiung, Girimurugan Senthil Balaji
University of Central Florida, Department of Statistics, Orlando, FL, USA.
Florida Gulf Coast University, Department of Mathematics, Fort Myers, FL, USA.
Stat Appl Genet Mol Biol. 2019 Feb 15;18(2):/j/sagmb.2019.18.issue-2/sagmb-2018-0045/sagmb-2018-0045.xml. doi: 10.1515/sagmb-2018-0045.
In recent years, alignment-free methods have been widely applied in comparing genome sequences, as these methods compute efficiently and provide desirable phylogenetic analysis results. These methods have been successfully combined with hierarchical clustering methods for finding phylogenetic trees. However, it may not be suitable to apply these alignment-free methods directly to existing statistical classification methods, because an appropriate statistical classification theory for integrating with the alignment-free representation methods is still lacking. In this article, we propose a discriminant analysis method which uses the discrete wavelet packet transform to classify whole genome sequences. The proposed alignment-free representation statistics of features follow a joint normal distribution asymptotically. The data analysis results indicate that the proposed method provides satisfactory classification results in real time.
近年来,无比对方法已被广泛应用于比较基因组序列,因为这些方法计算效率高,并能提供理想的系统发育分析结果。这些方法已成功地与层次聚类方法相结合以寻找系统发育树。然而,将这些无比对方法直接应用于现有的统计分类方法可能并不合适,因为仍然缺乏与无比对表示方法相结合的适当统计分类理论。在本文中,我们提出了一种使用离散小波包变换对全基因组序列进行分类的判别分析方法。所提出的特征的无比对表示统计量渐近地服从联合正态分布。数据分析结果表明,所提出的方法能实时提供令人满意的分类结果。