Suppr超能文献

一种用于筛选亲子三联体中新生突变的梯度提升方法。

A gradient-boosting approach for filtering de novo mutations in parent-offspring trios.

作者信息

Liu Yongzhuang, Li Bingshan, Tan Renjie, Zhu Xiaolin, Wang Yadong

机构信息

School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, Center for Human Genome Variation, Duke University, Durham, NC 27708 and Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37235, USASchool of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, Center for Human Genome Variation, Duke University, Durham, NC 27708 and Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37235, USA.

School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, Center for Human Genome Variation, Duke University, Durham, NC 27708 and Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37235, USA.

出版信息

Bioinformatics. 2014 Jul 1;30(13):1830-6. doi: 10.1093/bioinformatics/btu141. Epub 2014 Mar 10.

Abstract

MOTIVATION

Whole-genome and -exome sequencing on parent-offspring trios is a powerful approach to identifying disease-associated genes by detecting de novo mutations in patients. Accurate detection of de novo mutations from sequencing data is a critical step in trio-based genetic studies. Existing bioinformatic approaches usually yield high error rates due to sequencing artifacts and alignment issues, which may either miss true de novo mutations or call too many false ones, making downstream validation and analysis difficult. In particular, current approaches have much worse specificity than sensitivity, and developing effective filters to discriminate genuine from spurious de novo mutations remains an unsolved challenge.

RESULTS

In this article, we curated 59 sequence features in whole genome and exome alignment context which are considered to be relevant to discriminating true de novo mutations from artifacts, and then employed a machine-learning approach to classify candidates as true or false de novo mutations. Specifically, we built a classifier, named De Novo Mutation Filter (DNMFilter), using gradient boosting as the classification algorithm. We built the training set using experimentally validated true and false de novo mutations as well as collected false de novo mutations from an in-house large-scale exome-sequencing project. We evaluated DNMFilter's theoretical performance and investigated relative importance of different sequence features on the classification accuracy. Finally, we applied DNMFilter on our in-house whole exome trios and one CEU trio from the 1000 Genomes Project and found that DNMFilter could be coupled with commonly used de novo mutation detection approaches as an effective filtering approach to significantly reduce false discovery rate without sacrificing sensitivity.

AVAILABILITY

The software DNMFilter implemented using a combination of Java and R is freely available from the website at http://humangenome.duke.edu/software.

摘要

动机

对亲子三联体进行全基因组和外显子组测序是通过检测患者的新生突变来识别疾病相关基因的有力方法。从测序数据中准确检测新生突变是基于三联体的遗传研究中的关键步骤。由于测序假象和比对问题,现有的生物信息学方法通常会产生较高的错误率,这可能会遗漏真正的新生突变或产生过多的假阳性,从而使下游的验证和分析变得困难。特别是,当前方法的特异性比敏感性差得多,开发有效的过滤器以区分真正的和虚假的新生突变仍然是一个未解决的挑战。

结果

在本文中,我们整理了全基因组和外显子组比对背景下的59个序列特征,这些特征被认为与区分真正的新生突变和假象有关,然后采用机器学习方法将候选突变分类为真正的或虚假的新生突变。具体来说,我们构建了一个名为新生突变过滤器(DNMFilter)的分类器,使用梯度提升作为分类算法。我们使用经过实验验证的真、假新生突变以及从内部大规模外显子组测序项目中收集的假新生突变构建了训练集。我们评估了DNMFilter的理论性能,并研究了不同序列特征对分类准确性的相对重要性。最后,我们将DNMFilter应用于我们内部的全外显子三联体和来自千人基因组计划的一个CEU三联体,发现DNMFilter可以与常用的新生突变检测方法结合作为一种有效的过滤方法,在不牺牲敏感性的情况下显著降低错误发现率。

可用性

使用Java和R组合实现的软件DNMFilter可从网站http://humangenome.duke.edu/software免费获得。

相似文献

2
Filtering de novo indels in parent-offspring trios.过滤父-母-子三体型中的新发插入缺失。
BMC Bioinformatics. 2020 Dec 16;21(Suppl 16):547. doi: 10.1186/s12859-020-03900-z.
3
Joint detection of copy number variations in parent-offspring trios.亲子三联体中拷贝数变异的联合检测。
Bioinformatics. 2016 Apr 15;32(8):1130-7. doi: 10.1093/bioinformatics/btv707. Epub 2015 Dec 7.

引用本文的文献

6
7
Effective Analysis of Inpatient Satisfaction: The Random Forest Algorithm.住院患者满意度的有效分析:随机森林算法
Patient Prefer Adherence. 2021 Apr 7;15:691-703. doi: 10.2147/PPA.S294402. eCollection 2021.
9
Filtering de novo indels in parent-offspring trios.过滤父-母-子三体型中的新发插入缺失。
BMC Bioinformatics. 2020 Dec 16;21(Suppl 16):547. doi: 10.1186/s12859-020-03900-z.
10
Contributions of de novo variants to systemic lupus erythematosus.新生变异对系统性红斑狼疮的贡献。
Eur J Hum Genet. 2021 Jan;29(1):184-193. doi: 10.1038/s41431-020-0698-5. Epub 2020 Jul 28.

本文引用的文献

2
De novo mutations in epileptic encephalopathies.癫痫性脑病中的从头突变。
Nature. 2013 Sep 12;501(7466):217-21. doi: 10.1038/nature12439. Epub 2013 Aug 11.
9
De novo mutations in human genetic disease.人类遗传疾病中的新生突变。
Nat Rev Genet. 2012 Jul 18;13(8):565-75. doi: 10.1038/nrg3241.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验