一种用于筛选亲子三联体中新生突变的梯度提升方法。

A gradient-boosting approach for filtering de novo mutations in parent-offspring trios.

作者信息

Liu Yongzhuang, Li Bingshan, Tan Renjie, Zhu Xiaolin, Wang Yadong

机构信息

School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, Center for Human Genome Variation, Duke University, Durham, NC 27708 and Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37235, USASchool of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, Center for Human Genome Variation, Duke University, Durham, NC 27708 and Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37235, USA.

出版信息

Bioinformatics. 2014 Jul 1;30(13):1830-6. doi: 10.1093/bioinformatics/btu141. Epub 2014 Mar 10.

DOI:10.1093/bioinformatics/btu141

PMID:24618463

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4071207/

Abstract

MOTIVATION

Whole-genome and -exome sequencing on parent-offspring trios is a powerful approach to identifying disease-associated genes by detecting de novo mutations in patients. Accurate detection of de novo mutations from sequencing data is a critical step in trio-based genetic studies. Existing bioinformatic approaches usually yield high error rates due to sequencing artifacts and alignment issues, which may either miss true de novo mutations or call too many false ones, making downstream validation and analysis difficult. In particular, current approaches have much worse specificity than sensitivity, and developing effective filters to discriminate genuine from spurious de novo mutations remains an unsolved challenge.

RESULTS

In this article, we curated 59 sequence features in whole genome and exome alignment context which are considered to be relevant to discriminating true de novo mutations from artifacts, and then employed a machine-learning approach to classify candidates as true or false de novo mutations. Specifically, we built a classifier, named De Novo Mutation Filter (DNMFilter), using gradient boosting as the classification algorithm. We built the training set using experimentally validated true and false de novo mutations as well as collected false de novo mutations from an in-house large-scale exome-sequencing project. We evaluated DNMFilter's theoretical performance and investigated relative importance of different sequence features on the classification accuracy. Finally, we applied DNMFilter on our in-house whole exome trios and one CEU trio from the 1000 Genomes Project and found that DNMFilter could be coupled with commonly used de novo mutation detection approaches as an effective filtering approach to significantly reduce false discovery rate without sacrificing sensitivity.

AVAILABILITY

The software DNMFilter implemented using a combination of Java and R is freely available from the website at http://humangenome.duke.edu/software.

摘要

动机

对亲子三联体进行全基因组和外显子组测序是通过检测患者的新生突变来识别疾病相关基因的有力方法。从测序数据中准确检测新生突变是基于三联体的遗传研究中的关键步骤。由于测序假象和比对问题，现有的生物信息学方法通常会产生较高的错误率，这可能会遗漏真正的新生突变或产生过多的假阳性，从而使下游的验证和分析变得困难。特别是，当前方法的特异性比敏感性差得多，开发有效的过滤器以区分真正的和虚假的新生突变仍然是一个未解决的挑战。

结果

在本文中，我们整理了全基因组和外显子组比对背景下的59个序列特征，这些特征被认为与区分真正的新生突变和假象有关，然后采用机器学习方法将候选突变分类为真正的或虚假的新生突变。具体来说，我们构建了一个名为新生突变过滤器（DNMFilter）的分类器，使用梯度提升作为分类算法。我们使用经过实验验证的真、假新生突变以及从内部大规模外显子组测序项目中收集的假新生突变构建了训练集。我们评估了DNMFilter的理论性能，并研究了不同序列特征对分类准确性的相对重要性。最后，我们将DNMFilter应用于我们内部的全外显子三联体和来自千人基因组计划的一个CEU三联体，发现DNMFilter可以与常用的新生突变检测方法结合作为一种有效的过滤方法，在不牺牲敏感性的情况下显著降低错误发现率。

可用性

使用Java和R组合实现的软件DNMFilter可从网站http://humangenome.duke.edu/software免费获得。

相似文献

A gradient-boosting approach for filtering de novo mutations in parent-offspring trios.一种用于筛选亲子三联体中新生突变的梯度提升方法。

Bioinformatics. 2014 Jul 1;30(13):1830-6. doi: 10.1093/bioinformatics/btu141. Epub 2014 Mar 10.

Filtering de novo indels in parent-offspring trios.过滤父-母-子三体型中的新发插入缺失。

BMC Bioinformatics. 2020 Dec 16;21(Suppl 16):547. doi: 10.1186/s12859-020-03900-z.

Joint detection of copy number variations in parent-offspring trios.亲子三联体中拷贝数变异的联合检测。

Bioinformatics. 2016 Apr 15;32(8):1130-7. doi: 10.1093/bioinformatics/btv707. Epub 2015 Dec 7.

A Bayesian framework for de novo mutation calling in parents-offspring trios.一种用于亲子三人组中新生突变检测的贝叶斯框架。

Bioinformatics. 2015 May 1;31(9):1375-81. doi: 10.1093/bioinformatics/btu839. Epub 2014 Dec 21.

mirTrios: an integrated pipeline for detection of de novo and rare inherited mutations from trios-based next-generation sequencing.mirTrios：一种用于从基于三联体的下一代测序中检测新生和罕见遗传突变的综合流程。

J Med Genet. 2015 Apr;52(4):275-81. doi: 10.1136/jmedgenet-2014-102656. Epub 2015 Jan 16.

Fast detection of de novo copy number variants from SNP arrays for case-parent trios.基于 SNP 芯片的先证者-父母三体型检测新发拷贝数变异的快速方法。

BMC Bioinformatics. 2012 Dec 12;13:330. doi: 10.1186/1471-2105-13-330.

Mendelian Inconsistent Signatures from 1314 Ancestrally Diverse Family Trios Distinguish Biological Variation from Sequencing Error.来自1314个具有不同祖先的三联体家庭的孟德尔不一致特征区分了生物学变异与测序错误。

J Comput Biol. 2019 May;26(5):405-419. doi: 10.1089/cmb.2018.0253. Epub 2019 Apr 3.

An integrated approach for copy number variation discovery in parent-offspring trios.一种用于在亲子三家中发现拷贝数变异的综合方法。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab230.

Feature-based classifiers for somatic mutation detection in tumour-normal paired sequencing data.基于特征的分类器用于肿瘤-正常配对测序数据中的体细胞突变检测。

Bioinformatics. 2012 Jan 15;28(2):167-75. doi: 10.1093/bioinformatics/btr629. Epub 2011 Nov 13.

DIAMUND: direct comparison of genomes to detect mutations.DIAMUND：通过基因组的直接比较来检测突变。

Hum Mutat. 2014 Mar;35(3):283-8. doi: 10.1002/humu.22503.

引用本文的文献

Efficient identification of de novo mutations in family trios: a consensus-based informatic approach.家族三联体中新生突变的高效识别：一种基于共识的信息学方法。

Life Sci Alliance. 2025 Mar 28;8(6). doi: 10.26508/lsa.202403039. Print 2025 Jun.

Effective analysis of job satisfaction among medical staff in Chinese public hospitals: a random forest model.中国公立医院医务人员工作满意度的有效分析：随机森林模型

Front Public Health. 2024 Apr 18;12:1357709. doi: 10.3389/fpubh.2024.1357709. eCollection 2024.

Automated Identification of Germline Mutations in Family Trios: A Consensus-Based Informatic Approach.家系三联体中生殖系突变的自动识别：一种基于共识的信息学方法。

bioRxiv. 2024 Mar 13:2024.03.08.584100. doi: 10.1101/2024.03.08.584100.

Deep exome sequencing identifies enrichment of deleterious mosaic variants in neurodevelopmental disorder genes and mitochondrial tRNA regions in bipolar disorder.深度外显子组测序发现，双相情感障碍中神经发育障碍基因和线粒体 tRNA 区域存在有害镶嵌变体的富集。

Mol Psychiatry. 2023 Oct;28(10):4294-4306. doi: 10.1038/s41380-023-02096-x. Epub 2023 May 30.

Systematic analysis of exonic germline and postzygotic de novo mutations in bipolar disorder.系统性分析双相情感障碍中外显子胚系和后合子新生突变。

Nat Commun. 2021 Jun 18;12(1):3750. doi: 10.1038/s41467-021-23453-w.

Where Do We Stand in Regularization for Life Science Studies?我们在生命科学研究的正则化方面处于什么位置？

J Comput Biol. 2022 Mar;29(3):213-232. doi: 10.1089/cmb.2019.0371. Epub 2021 Apr 29.

Effective Analysis of Inpatient Satisfaction: The Random Forest Algorithm.住院患者满意度的有效分析：随机森林算法

Patient Prefer Adherence. 2021 Apr 7;15:691-703. doi: 10.2147/PPA.S294402. eCollection 2021.

inGAP-family: Accurate Detection of Meiotic Recombination Loci and Causal Mutations by Filtering Out Artificial Variants due to Genome Complexities.inGAP家族：通过过滤因基因组复杂性产生的人工变异准确检测减数分裂重组位点和致病突变。

Genomics Proteomics Bioinformatics. 2022 Jun;20(3):524-535. doi: 10.1016/j.gpb.2019.11.014. Epub 2021 Mar 10.

Filtering de novo indels in parent-offspring trios.过滤父-母-子三体型中的新发插入缺失。

BMC Bioinformatics. 2020 Dec 16;21(Suppl 16):547. doi: 10.1186/s12859-020-03900-z.

Contributions of de novo variants to systemic lupus erythematosus.新生变异对系统性红斑狼疮的贡献。

Eur J Hum Genet. 2021 Jan;29(1):184-193. doi: 10.1038/s41431-020-0698-5. Epub 2020 Jul 28.

本文引用的文献

DeNovoGear: de novo indel and point mutation discovery and phasing.DeNovoGear：从头缺失和点突变发现及相位分析。

Nat Methods. 2013 Oct;10(10):985-7. doi: 10.1038/nmeth.2611. Epub 2013 Aug 25.

De novo mutations in epileptic encephalopathies.癫痫性脑病中的从头突变。

Nature. 2013 Sep 12;501(7466):217-21. doi: 10.1038/nature12439. Epub 2013 Aug 11.

A support vector machine for identification of single-nucleotide polymorphisms from next-generation sequencing data.基于下一代测序数据的单核苷酸多态性识别的支持向量机。

Bioinformatics. 2013 Jun 1;29(11):1361-6. doi: 10.1093/bioinformatics/btt172. Epub 2013 Apr 24.

Whole-genome sequencing in autism identifies hot spots for de novo germline mutation.自闭症的全基因组测序确定了新生种系突变的热点。

Cell. 2012 Dec 21;151(7):1431-42. doi: 10.1016/j.cell.2012.11.019.

A likelihood-based framework for variant calling and de novo mutation detection in families.基于可能性的框架，用于家族中的变异调用和从头突变检测。

PLoS Genet. 2012;8(10):e1002944. doi: 10.1371/journal.pgen.1002944. Epub 2012 Oct 4.

De novo gene mutations highlight patterns of genetic and neural complexity in schizophrenia.从头突变基因突显精神分裂症遗传和神经复杂性的模式。

Nat Genet. 2012 Dec;44(12):1365-9. doi: 10.1038/ng.2446. Epub 2012 Oct 3.

Diagnostic exome sequencing in persons with severe intellectual disability.对严重智力障碍者进行外显子组诊断测序。

N Engl J Med. 2012 Nov 15;367(20):1921-9. doi: 10.1056/NEJMoa1206524. Epub 2012 Oct 3.

Range of genetic mutations associated with severe non-syndromic sporadic intellectual disability: an exome sequencing study.与严重非综合征性散发性智力障碍相关的基因突变范围：外显子组测序研究。

Lancet. 2012 Nov 10;380(9854):1674-82. doi: 10.1016/S0140-6736(12)61480-9. Epub 2012 Sep 27.

De novo mutations in human genetic disease.人类遗传疾病中的新生突变。

Nat Rev Genet. 2012 Jul 18;13(8):565-75. doi: 10.1038/nrg3241.

forestSV: structural variant discovery through statistical learning.forestSV：基于统计学习的结构变异发现。

Nat Methods. 2012 Jul 1;9(8):819-21. doi: 10.1038/nmeth.2085.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验