Suppr超能文献

利用新特征和多分类器对癌症激酶组中罕见致癌突变进行预测和优先级排序。

Prediction and prioritization of rare oncogenic mutations in the cancer Kinome using novel features and multiple classifiers.

作者信息

U ManChon, Talevich Eric, Katiyar Samiksha, Rasheed Khaled, Kannan Natarajan

机构信息

Department of Computer Science, University of Georgia, Athens, Georgia, United States of America.

Department of Dermatology, University of California San Francisco, San Francisco, California, United States of America.

出版信息

PLoS Comput Biol. 2014 Apr 17;10(4):e1003545. doi: 10.1371/journal.pcbi.1003545. eCollection 2014 Apr.

Abstract

Cancer is a genetic disease that develops through a series of somatic mutations, a subset of which drive cancer progression. Although cancer genome sequencing studies are beginning to reveal the mutational patterns of genes in various cancers, identifying the small subset of "causative" mutations from the large subset of "non-causative" mutations, which accumulate as a consequence of the disease, is a challenge. In this article, we present an effective machine learning approach for identifying cancer-associated mutations in human protein kinases, a class of signaling proteins known to be frequently mutated in human cancers. We evaluate the performance of 11 well known supervised learners and show that a multiple-classifier approach, which combines the performances of individual learners, significantly improves the classification of known cancer-associated mutations. We introduce several novel features related specifically to structural and functional characteristics of protein kinases and find that the level of conservation of the mutated residue at specific evolutionary depths is an important predictor of oncogenic effect. We consolidate the novel features and the multiple-classifier approach to prioritize and experimentally test a set of rare unconfirmed mutations in the epidermal growth factor receptor tyrosine kinase (EGFR). Our studies identify T725M and L861R as rare cancer-associated mutations inasmuch as these mutations increase EGFR activity in the absence of the activating EGF ligand in cell-based assays.

摘要

癌症是一种通过一系列体细胞突变发展而来的遗传性疾病,其中一部分突变驱动癌症进展。尽管癌症基因组测序研究开始揭示各种癌症中基因的突变模式,但从因疾病积累而产生的大量“非致病性”突变中识别出一小部分“致病性”突变是一项挑战。在本文中,我们提出了一种有效的机器学习方法,用于识别人类蛋白激酶中的癌症相关突变,蛋白激酶是一类在人类癌症中经常发生突变的信号蛋白。我们评估了11种著名的监督学习器的性能,并表明结合单个学习器性能的多分类器方法显著提高了已知癌症相关突变的分类。我们引入了几个与蛋白激酶的结构和功能特征特别相关的新特征,并发现特定进化深度处突变残基的保守程度是致癌作用的重要预测指标。我们整合了新特征和多分类器方法,对表皮生长因子受体酪氨酸激酶(EGFR)中的一组罕见未确认突变进行优先级排序并进行实验测试。我们的研究确定T725M和L861R为罕见的癌症相关突变,因为在基于细胞的实验中,这些突变在没有激活型表皮生长因子(EGF)配体的情况下会增加EGFR活性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f32/3990476/07575ece2858/pcbi.1003545.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验