Suppr超能文献

利用线性判别分析与最小熵映射融合的方法对人类基因行为进行分类。

Classification of Homo sapiens gene behavior using linear discriminant analysis fused with minimum entropy mapping.

机构信息

Institute of Radio Physics & Electronics, University of Calcutta, Kolkata, India.

出版信息

Med Biol Eng Comput. 2021 Mar;59(3):673-691. doi: 10.1007/s11517-021-02324-y. Epub 2021 Feb 17.

Abstract

Classification of Homo sapiens gene behavior employing computational biology is a recent research trend. But monitoring gene activity profile and genetic behavior from the alphabetic DNA sequence using a non-invasive method is a tremendous challenge in functional genomics. The present paper addresses such issue and attempts to differentiate Homo sapiens genes using linear discriminant analysis (LDA) method. Annotated protein coding sequences of Homo sapiens genes, collected from NCBI, are taken as test samples. Minimum entropy-based mapping (MEM) technique assists to extract highest information from the numerical DNA sequences. The proposed LDA technique has successfully classified Homo sapiens genes based on the following features: composition of hydrophilic amino acids, dominance of arginine amino acid, and magnitude and size of individual amino acids. The proposed algorithm is successfully tested on 84 Homo sapiens healthy and cancer genes of the prostate and breast cells. Classification performance of the proposed LDA technique is judged by sensitivity (89.12%), specificity (91.9%), accuracy (90.87%), F1 score (92.03%), Matthews' correlation coefficients (81.04%), and miss rate (9.12%), and it outperforms other four existing classifiers. The results are cross-validated through Rayleigh PDF and mutual information technique. Fisher test, 2-sample T-test, and relative entropy test are considered to verify the efficacy of the present classifier.

摘要

采用计算生物学对人类基因行为进行分类是最近的研究趋势。但是,使用非侵入性方法从字母 DNA 序列监测基因活性谱和遗传行为是功能基因组学中的一个巨大挑战。本文解决了这个问题,并尝试使用线性判别分析(LDA)方法对人类基因进行区分。从 NCBI 收集的人类基因的注释蛋白编码序列被用作测试样本。基于最小熵映射(MEM)技术有助于从数字 DNA 序列中提取最高信息。所提出的 LDA 技术成功地根据以下特征对人类基因进行了分类:亲水氨基酸的组成、精氨酸氨基酸的优势,以及单个氨基酸的大小和数量。该算法已成功应用于前列腺和乳腺细胞的 84 个健康和癌症人类基因。通过灵敏度(89.12%)、特异性(91.9%)、准确性(90.87%)、F1 分数(92.03%)、马修斯相关系数(81.04%)和漏报率(9.12%)来判断所提出的 LDA 技术的分类性能,并且优于其他四种现有的分类器。结果通过瑞利 PDF 和互信息技术进行了交叉验证。Fisher 检验、2 样本 T 检验和相对熵检验用于验证本分类器的功效。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验