利用基于核的机器学习方法的敏感性分析进行基因剪接，应用于癌症数据。

Gene shaving using a sensitivity analysis of kernel based machine learning approach, with applications to cancer data.

机构信息

Tulane Center of Bioinformatics and Genomics, Department of Global Biostatistics and Data Science, Tulane University, New Orleans, LA 70112, United States of America.

Department of Statistics, Hajee Mohammad Danesh Science and Technology University, Dinajpur 5200, Bangladesh.

出版信息

PLoS One. 2019 May 23;14(5):e0217027. doi: 10.1371/journal.pone.0217027. eCollection 2019.

DOI:10.1371/journal.pone.0217027

PMID:31120939

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6532884/

Abstract

BACKGROUND

Gene shaving (GS) is an essential and challenging tools for biomedical researchers due to the large number of genes in human genome and the complex nature of biological networks. Most GS methods are not applicable to non-linear and multi-view data sets. While the kernel based methods can overcome these problems, a well-founded positive definite kernel based GS method has yet to be proposed for biomedical data analysis.

METHODS AND FINDINGS

Since the kernel based methods on genomic information can improve the prediction of diseases, here we proposed a noble method, "kernel based gene shaving" which is based on the influence function of kernel canonical correlation analysis. To investigate the performance of the proposed method in comparison to state-of-the-art-method in gene saving, we analyzed extensive simulated and real microarray gene expression data set. The performance metrics including true positive rate, true negative rate, false positive rate, false negative rate, misclassification error rate, the false discovery rate and area under curves were computed for each methods. In colon cancer data analysis, the proposed method identified a significant subsets of 210 genes out of 2000 genes and suggestive superior performance compared with other methods. The proposed method can be applied to the study of other disease process where two view data is a common task.

CONCLUSIONS

We addressed the challenge of finding unique kernel based GS methods by using the influence function of kernel canonical correlation analysis. The proposed method has shown to have better performance than state-of-the-art-methods in gene saving and has identified many more significant gene interactions, suggesting that genes function in a concerted effort in colon cancer. In similar biomedical data analysis, kernel based methods could be applied to select a potential subset of genes. The positive definite kernel based methods can overcome the non-linearity problem and improve the prediction process.

摘要

背景

由于人类基因组中的基因数量众多，以及生物网络的复杂性，基因剪接（GS）是生物医学研究人员的重要且具有挑战性的工具。大多数 GS 方法不适用于非线性和多视图数据集。虽然基于核的方法可以克服这些问题，但尚未提出用于生物医学数据分析的基于核的良好正定 GS 方法。

方法和发现

由于基于核的方法可以改善对疾病的预测，因此我们提出了一种基于核典型相关分析影响函数的卓越方法，即“基于核的基因剪接”。为了研究与基因保存的最先进方法相比，所提出的方法在基因保存方面的性能，我们分析了广泛的模拟和真实微阵列基因表达数据集。对于每种方法，计算了性能指标，包括真阳性率、真阴性率、假阳性率、假阴性率、错误分类误差率、错误发现率和曲线下面积。在结肠癌数据分析中，该方法从 2000 个基因中鉴定出了 210 个有意义的基因子集，并且与其他方法相比表现出了更好的性能。所提出的方法可以应用于其他疾病过程的研究，其中两视图数据是一项常见任务。

结论

我们通过使用核典型相关分析的影响函数解决了寻找独特基于核的 GS 方法的挑战。与基因保存的最先进方法相比，所提出的方法表现出更好的性能，并且鉴定出更多的显著基因相互作用，表明基因在结肠癌中协同作用。在类似的生物医学数据分析中，可以应用基于核的方法来选择潜在的基因子集。基于核的方法可以克服非线性问题并改善预测过程。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/23f8/6532884/ea0a59aefa32/pone.0217027.g001.jpg

相似文献

Gene shaving using a sensitivity analysis of kernel based machine learning approach, with applications to cancer data.

PLoS One. 2019 May 23;14(5):e0217027. doi: 10.1371/journal.pone.0217027. eCollection 2019.

Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data.

BMC Bioinformatics. 2007 Feb 28;8:67. doi: 10.1186/1471-2105-8-67.

Correlation kernels for support vector machines classification with applications in cancer data.

Comput Math Methods Med. 2012;2012:205025. doi: 10.1155/2012/205025. Epub 2012 Aug 7.

A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.

Artif Intell Med. 2007 Oct;41(2):161-75. doi: 10.1016/j.artmed.2007.07.008. Epub 2007 Sep 11.

Adaptive diffusion kernel learning from biological networks for protein function prediction.

BMC Bioinformatics. 2008 Mar 25;9:162. doi: 10.1186/1471-2105-9-162.

Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction.

Bioinformatics. 2004 Nov 22;20(17):3185-95. doi: 10.1093/bioinformatics/bth383. Epub 2004 Jul 1.

Gene selection and classification from microarray data using kernel machine.

FEBS Lett. 2004 Jul 30;571(1-3):93-8. doi: 10.1016/j.febslet.2004.05.087.

A DSRPCL-SVM approach to informative gene analysis.

Genomics Proteomics Bioinformatics. 2008 Jun;6(2):83-90. doi: 10.1016/S1672-0229(08)60023-6.

C-HMOSHSSA: Gene selection for cancer classification using multi-objective meta-heuristic and machine learning methods.

Comput Methods Programs Biomed. 2019 Sep;178:219-235. doi: 10.1016/j.cmpb.2019.06.029. Epub 2019 Jun 29.

Analyzing kernel matrices for the identification of differentially expressed genes.

PLoS One. 2013 Dec 9;8(12):e81683. doi: 10.1371/journal.pone.0081683. eCollection 2013.

引用本文的文献

Magnetic resonance imaging-based artificial intelligence model in rectal cancer.

World J Gastroenterol. 2021 May 14;27(18):2122-2130. doi: 10.3748/wjg.v27.i18.2122.

本文引用的文献

Influence Function and Robust Variant of Kernel Canonical Correlation Analysis.

Neurocomputing (Amst). 2018 Aug 23;304:12-29. doi: 10.1016/j.neucom.2018.04.008. Epub 2018 May 3.

A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data.

Adv Bioinformatics. 2015;2015:198363. doi: 10.1155/2015/198363. Epub 2015 Jun 11.

limma powers differential expression analyses for RNA-sequencing and microarray studies.

Nucleic Acids Res. 2015 Apr 20;43(7):e47. doi: 10.1093/nar/gkv007. Epub 2015 Jan 20.

STRING v10: protein-protein interaction networks, integrated over the tree of life.

Nucleic Acids Res. 2015 Jan;43(Database issue):D447-52. doi: 10.1093/nar/gku1003. Epub 2014 Oct 28.

Random forests for genomic data analysis.

Genomics. 2012 Jun;99(6):323-9. doi: 10.1016/j.ygeno.2012.04.003. Epub 2012 Apr 21.

Integrated analysis of gene expression and copy number data on gene shaving using independent component analysis.

IEEE/ACM Trans Comput Biol Bioinform. 2011 Nov-Dec;8(6):1568-79. doi: 10.1109/TCBB.2011.71.

An empirical Bayes' approach to joint analysis of multiple microarray gene expression studies.

Biometrics. 2011 Dec;67(4):1617-26. doi: 10.1111/j.1541-0420.2011.01602.x. Epub 2011 Apr 22.

Should we abandon the t-test in the analysis of gene expression microarray data: a comparison of variance modeling strategies.

PLoS One. 2010 Sep 3;5(9):e12336. doi: 10.1371/journal.pone.0012336.

Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources.

Nat Protoc. 2009;4(1):44-57. doi: 10.1038/nprot.2008.211.

A regularized kernel CCA contrast function for ICA.

Neural Netw. 2008 Mar-Apr;21(2-3):170-81. doi: 10.1016/j.neunet.2007.12.047. Epub 2008 Jan 10.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用基于核的机器学习方法的敏感性分析进行基因剪接，应用于癌症数据。

Gene shaving using a sensitivity analysis of kernel based machine learning approach, with applications to cancer data.

机构信息

出版信息

BACKGROUND

METHODS AND FINDINGS

CONCLUSIONS

背景

方法和发现

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献