Ghosh Antara, Barman Soma
Institute of Radio Physics and Electronics, University of Calcutta, 92, APC Road, Kolkata 700009, India.
Institute of Radio Physics and Electronics, University of Calcutta, 92, APC Road, Kolkata 700009, India.
Gene. 2016 Jun 1;583(2):112-120. doi: 10.1016/j.gene.2016.02.015. Epub 2016 Feb 11.
Gene systems are extremely complex, heterogeneous, and noisy in nature. Many statistical tools which are used to extract relevant feature from genes provide fuzzy and ambiguous information. High-dimensional gene expression database available in public domain usually contains thousands of genes. Efficient prediction method is demanding nowadays for accurate identification of such database. Euclidean distance measurement and principal component analysis methods are applied on such databases to identify the genes. In both methods, prediction algorithm is based on homology search approach. Digital Signal Processing technique along with statistical method is used for analysis of genes in both cases. A two-level decision logic is used for gene classification as healthy or cancerous. This binary logic minimizes the prediction error and improves prediction accuracy. Superiority of the method is judged by receiver operating characteristic curve.
基因系统本质上极其复杂、异质且具有噪声。许多用于从基因中提取相关特征的统计工具提供的信息模糊且不明确。公共领域中可用的高维基因表达数据库通常包含数千个基因。如今,对于准确识别此类数据库,高效的预测方法很有必要。欧几里得距离测量和主成分分析方法被应用于此类数据库以识别基因。在这两种方法中,预测算法都基于同源性搜索方法。在这两种情况下,数字信号处理技术与统计方法一起用于基因分析。使用两级决策逻辑将基因分类为健康或癌变。这种二元逻辑可将预测误差降至最低并提高预测准确性。该方法的优越性通过接收者操作特征曲线来判断。