Suppr超能文献

基于分布式相关的基因选择从微阵列数据中检测生物标志物。

Detecting biomarkers from microarray data using distributed correlation based gene selection.

机构信息

Department of Computer Science and Engineering, G L Bajaj Institute of Technology and Management, Greater Noida, Uttar Pradesh, India.

SRM University, Amaravati, India.

出版信息

Genes Genomics. 2020 Apr;42(4):449-465. doi: 10.1007/s13258-020-00916-w. Epub 2020 Feb 10.

Abstract

BACKGROUND

Over the past few decades, DNA microarray technology has emerged as a prevailing process for early identification of cancer subtypes. Several feature selection (FS) techniques have been widely applied for identifying cancer from microarray gene data but only very few studies have been conducted on distributing the feature selection process for detecting cancer subtypes.

OBJECTIVE

Not all the gene expressions are needed in prediction, this research article objective is to select discriminative biomarkers by using distributed FS method which helps in accurately diagnosis of cancer subtype. Traditional feature selection techniques have several drawbacks like unrelated features that could perform well in terms of classification accuracy with a suitable subset of genes will be left out of the selection.

METHOD

To overcome the issue, in this paper a new filter-based method for gene selection is introduced which can select the highly relevant genes for distinguishing tissues from the gene expression dataset. In addition, it is used to compute the relation between gene-gene and gene-class and simultaneously identify subset of essential genes. Our method is tested on Diffuse Large B cell Lymphoma (DLBCL) dataset by using well-known classification techniques such as support vector machine, naïve Bayes, k-nearest neighbor, and decision tree.

RESULTS

Results on biological DLBCL dataset demonstrate that the proposed method provides promising tools for the prediction of cancer type, with the prediction accuracy of 97.62%, precision of 94.23%, sensitivity of 94.12%, F-measure of 90.12%, and ROC value of 99.75%.

CONCLUSION

The experimental results reveal the fact that the proposed method is significantly improved classification accuracy and execution time, compared to existing standard algorithms when applied to the non-partitioned dataset. Furthermore, the extracted genes are biologically sound and agree with the outcome of relevant biomedical studies.

摘要

背景

在过去的几十年中,DNA 微阵列技术已成为早期鉴定癌症亚型的主要手段。已经广泛应用了几种特征选择(FS)技术来从微阵列基因数据中识别癌症,但只有很少的研究涉及分布式 FS 过程用于检测癌症亚型。

目的

并非所有基因表达都需要用于预测,本研究旨在通过使用分布式 FS 方法选择有区别的生物标志物,以准确诊断癌症亚型。传统的特征选择技术存在一些缺点,例如与分类准确性相关的无关特征,通过合适的基因子集也可以表现良好,这些特征将被排除在选择之外。

方法

为了解决这个问题,本文引入了一种新的基于滤波器的基因选择方法,该方法可以从基因表达数据集中选择高度相关的基因来区分组织。此外,它用于计算基因-基因和基因-类之间的关系,并同时识别出基本基因子集。我们的方法在弥漫性大 B 细胞淋巴瘤(DLBCL)数据集上使用了著名的分类技术,如支持向量机、朴素贝叶斯、k-最近邻和决策树进行了测试。

结果

在生物学上的 DLBCL 数据集上的结果表明,该方法为癌症类型的预测提供了有前途的工具,其预测准确性为 97.62%,精度为 94.23%,灵敏度为 94.12%,F 值为 90.12%,ROC 值为 99.75%。

结论

与现有标准算法相比,应用于非分区数据集时,实验结果表明,该方法的分类准确性和执行时间有显著提高。此外,提取的基因在生物学上是合理的,并且与相关的生物医学研究结果一致。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验