Institute of Radio Physics & Electronics, University of Calcutta, Kolkata, India.
Med Biol Eng Comput. 2021 Mar;59(3):673-691. doi: 10.1007/s11517-021-02324-y. Epub 2021 Feb 17.
Classification of Homo sapiens gene behavior employing computational biology is a recent research trend. But monitoring gene activity profile and genetic behavior from the alphabetic DNA sequence using a non-invasive method is a tremendous challenge in functional genomics. The present paper addresses such issue and attempts to differentiate Homo sapiens genes using linear discriminant analysis (LDA) method. Annotated protein coding sequences of Homo sapiens genes, collected from NCBI, are taken as test samples. Minimum entropy-based mapping (MEM) technique assists to extract highest information from the numerical DNA sequences. The proposed LDA technique has successfully classified Homo sapiens genes based on the following features: composition of hydrophilic amino acids, dominance of arginine amino acid, and magnitude and size of individual amino acids. The proposed algorithm is successfully tested on 84 Homo sapiens healthy and cancer genes of the prostate and breast cells. Classification performance of the proposed LDA technique is judged by sensitivity (89.12%), specificity (91.9%), accuracy (90.87%), F1 score (92.03%), Matthews' correlation coefficients (81.04%), and miss rate (9.12%), and it outperforms other four existing classifiers. The results are cross-validated through Rayleigh PDF and mutual information technique. Fisher test, 2-sample T-test, and relative entropy test are considered to verify the efficacy of the present classifier.
采用计算生物学对人类基因行为进行分类是最近的研究趋势。但是,使用非侵入性方法从字母 DNA 序列监测基因活性谱和遗传行为是功能基因组学中的一个巨大挑战。本文解决了这个问题,并尝试使用线性判别分析(LDA)方法对人类基因进行区分。从 NCBI 收集的人类基因的注释蛋白编码序列被用作测试样本。基于最小熵映射(MEM)技术有助于从数字 DNA 序列中提取最高信息。所提出的 LDA 技术成功地根据以下特征对人类基因进行了分类:亲水氨基酸的组成、精氨酸氨基酸的优势,以及单个氨基酸的大小和数量。该算法已成功应用于前列腺和乳腺细胞的 84 个健康和癌症人类基因。通过灵敏度(89.12%)、特异性(91.9%)、准确性(90.87%)、F1 分数(92.03%)、马修斯相关系数(81.04%)和漏报率(9.12%)来判断所提出的 LDA 技术的分类性能,并且优于其他四种现有的分类器。结果通过瑞利 PDF 和互信息技术进行了交叉验证。Fisher 检验、2 样本 T 检验和相对熵检验用于验证本分类器的功效。