Huang Weitai, Guo Yu Amanda, Muthukumar Karthik, Baruah Probhonjon, Chang Mei Mei, Jacobsen Skanderup Anders
Department of Computational and Systems Biology, Agency for Science Technology and Research, Genome Institute of Singapore, Singapore, Singapore.
Graduate School of Integrative Sciences and Engineering, National University of Singapore, Singapore, Singapore.
Bioinformatics. 2019 Sep 1;35(17):3157-3159. doi: 10.1093/bioinformatics/btz018.
Somatic Mutation calling method using a Random Forest (SMuRF) integrates predictions and auxiliary features from multiple somatic mutation callers using a supervised machine learning approach. SMuRF is trained on community-curated matched tumor and normal whole genome sequencing data. SMuRF predicts both SNVs and indels with high accuracy in genome or exome-level sequencing data. Furthermore, the method is robust across multiple tested cancer types and predicts low allele frequency variants with high accuracy. In contrast to existing ensemble-based somatic mutation calling approaches, SMuRF works out-of-the-box and is orders of magnitudes faster.
The method is implemented in R and available at https://github.com/skandlab/SMuRF. SMuRF operates as an add-on to the community-developed bcbio-nextgen somatic variant calling pipeline.
Supplementary data are available at Bioinformatics online.
使用随机森林的体细胞突变检测方法(SMuRF)采用监督式机器学习方法,整合了来自多个体细胞突变检测工具的预测结果和辅助特征。SMuRF基于社区整理的匹配肿瘤和正常全基因组测序数据进行训练。在基因组或外显子水平测序数据中,SMuRF能高精度地预测单核苷酸变异(SNV)和插入缺失(indel)。此外,该方法在多种测试癌症类型中都表现稳健,能高精度地预测低等位基因频率变异。与现有的基于集成的体细胞突变检测方法相比,SMuRF开箱即用,速度快几个数量级。
该方法用R语言实现,可在https://github.com/skandlab/SMuRF获取。SMuRF作为社区开发的bcbio-nextgen体细胞变异检测流程的插件运行。
补充数据可在《生物信息学》在线获取。