Suppr超能文献

基于交互作用的高维生物数据特征选择与分类。

Interaction-based feature selection and classification for high-dimensional biological data.

机构信息

Department of ISOM, HKUST, Clear Water Bay, Kowloon, Hong Kong.

出版信息

Bioinformatics. 2012 Nov 1;28(21):2834-42. doi: 10.1093/bioinformatics/bts531. Epub 2012 Sep 3.

Abstract

MOTIVATION

Epistasis or gene-gene interaction has gained increasing attention in studies of complex diseases. Its presence as an ubiquitous component of genetic architecture of common human diseases has been contemplated. However, the detection of gene-gene interaction is difficult due to combinatorial explosion.

RESULTS

We present a novel feature selection method incorporating variable interaction. Three gene expression datasets are analyzed to illustrate our method, although it can also be applied to other types of high-dimensional data. The quality of variables selected is evaluated in two ways: first by classification error rates, then by functional relevance assessed using biological knowledge. We show that the classification error rates can be significantly reduced by considering interactions. Secondly, a sizable portion of genes identified by our method for breast cancer metastasis overlaps with those reported in gene-to-system breast cancer (G2SBC) database as disease associated and some of them have interesting biological implication. In summary, interaction-based methods may lead to substantial gain in biological insights as well as more accurate prediction.

摘要

动机

上位性或基因-基因相互作用在复杂疾病的研究中受到越来越多的关注。它作为常见人类疾病遗传结构的普遍组成部分的存在已经被考虑。然而,由于组合爆炸,基因-基因相互作用的检测很困难。

结果

我们提出了一种新的特征选择方法,该方法结合了变量相互作用。分析了三个基因表达数据集来说明我们的方法,尽管它也可以应用于其他类型的高维数据。通过两种方式评估所选变量的质量:首先通过分类错误率,然后通过使用生物知识评估功能相关性。我们表明,通过考虑相互作用,可以显著降低分类错误率。其次,我们的方法识别的乳腺癌转移相关基因中有相当一部分与基因到系统乳腺癌(G2SBC)数据库中报告的与疾病相关的基因重叠,其中一些具有有趣的生物学意义。总之,基于相互作用的方法可能会在生物学见解和更准确的预测方面带来实质性的收益。

相似文献

5
switchBox: an R package for k-Top Scoring Pairs classifier development.开关盒:一个用于开发k-高分对分类器的R软件包。
Bioinformatics. 2015 Jan 15;31(2):273-4. doi: 10.1093/bioinformatics/btu622. Epub 2014 Sep 26.

引用本文的文献

6
Framework for making better predictions by directly estimating variables' predictivity.通过直接估计变量的预测能力来进行更好预测的框架。
Proc Natl Acad Sci U S A. 2016 Dec 13;113(50):14277-14282. doi: 10.1073/pnas.1616647113. Epub 2016 Nov 29.
7
A fast and powerful W-test for pairwise epistasis testing.一种用于成对上位性检验的快速且强大的W检验。
Nucleic Acids Res. 2016 Jul 8;44(12):e115. doi: 10.1093/nar/gkw347. Epub 2016 Apr 25.
9
Why significant variables aren't automatically good predictors.为什么显著变量并非自动成为良好的预测指标。
Proc Natl Acad Sci U S A. 2015 Nov 10;112(45):13892-7. doi: 10.1073/pnas.1518285112. Epub 2015 Oct 26.

本文引用的文献

6
Epistasis and its implications for personal genetics.上位效应及其对个人遗传学的影响。
Am J Hum Genet. 2009 Sep;85(3):309-20. doi: 10.1016/j.ajhg.2009.08.006.
7
10
A review of feature selection techniques in bioinformatics.生物信息学中特征选择技术综述。
Bioinformatics. 2007 Oct 1;23(19):2507-17. doi: 10.1093/bioinformatics/btm344. Epub 2007 Aug 24.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验