Suppr超能文献

DNN-Boost:使用深度神经网络和 XGBoost 对仅肿瘤全外显子测序数据进行体细胞突变识别。

DNN-Boost: Somatic mutation identification of tumor-only whole-exome sequencing data using deep neural network and XGBoost.

机构信息

School of Computer Science and Engineering, Pusan National University, 63 Busandaehak-Ro, Busan 46241, Republic of Korea.

出版信息

J Bioinform Comput Biol. 2021 Dec;19(6):2140017. doi: 10.1142/S0219720021400175. Epub 2021 Dec 13.

Abstract

Detection of somatic mutation in whole-exome sequencing data can help elucidate the mechanism of tumor progression. Most computational approaches require exome sequencing for both tumor and normal samples. However, it is more common to sequence exomes for tumor samples only without the paired normal samples. To include these types of data for extensive studies on the process of tumorigenesis, it is necessary to develop an approach for identifying somatic mutations using tumor exome sequencing data only. In this study, we designed a machine learning approach using Deep Neural Network (DNN) and XGBoost to identify somatic mutations in tumor-only exome sequencing data and we integrated this into a pipeline called DNN-Boost. The XGBoost algorithm is used to extract the features from the results of variant callers and these features are then fed into the DNN model as input. The XGBoost algorithm resolves issues of missing values and overfitting. We evaluated our proposed model and compared its performance with other existing benchmark methods. We noted that the DNN-Boost classification model outperformed the benchmark method in classifying somatic mutations from paired tumor-normal exome data and tumor-only exome data.

摘要

在全外显子组测序数据中检测体细胞突变有助于阐明肿瘤进展的机制。大多数计算方法都需要肿瘤和正常样本的外显子组测序。然而,更常见的情况是仅对肿瘤样本进行外显子组测序,而没有配对的正常样本。为了将这些类型的数据纳入对肿瘤发生过程的广泛研究中,有必要开发一种仅使用肿瘤外显子组测序数据识别体细胞突变的方法。在这项研究中,我们设计了一种使用深度神经网络(DNN)和 XGBoost 的机器学习方法来识别仅肿瘤外显子组测序数据中的体细胞突变,并将其集成到称为 DNN-Boost 的管道中。XGBoost 算法用于从变异调用器的结果中提取特征,然后将这些特征作为输入提供给 DNN 模型。XGBoost 算法解决了缺失值和过拟合的问题。我们评估了我们提出的模型,并将其性能与其他现有的基准方法进行了比较。我们注意到,在对来自配对的肿瘤-正常外显子组数据和仅肿瘤外显子组数据的体细胞突变进行分类时,DNN-Boost 分类模型的性能优于基准方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验