• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在外显子组测序数据中寻找可靠结果的艰难努力:过滤孟德尔错误。

The struggle to find reliable results in exome sequencing data: filtering out Mendelian errors.

机构信息

Division of Rheumatology, Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati OH, USA ; Medical Scientist Training Program, University of Cincinnati College of Medicine, Cincinnati OH, USA.

Division of Rheumatology, Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati OH, USA ; Department of Veterans Affairs, Veterans Affairs Medical Center - Cincinnati, Cincinnati OH, USA.

出版信息

Front Genet. 2014 Feb 12;5:16. doi: 10.3389/fgene.2014.00016. eCollection 2014.

DOI:10.3389/fgene.2014.00016
PMID:24575121
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3921572/
Abstract

Next Generation Sequencing studies generate a large quantity of genetic data in a relatively cost and time efficient manner and provide an unprecedented opportunity to identify candidate causative variants that lead to disease phenotypes. A challenge to these studies is the generation of sequencing artifacts by current technologies. To identify and characterize the properties that distinguish false positive variants from true variants, we sequenced a child and both parents (one trio) using DNA isolated from three sources (blood, buccal cells, and saliva). The trio strategy allowed us to identify variants in the proband that could not have been inherited from the parents (Mendelian errors) and would most likely indicate sequencing artifacts. Quality control measurements were examined and three measurements were found to identify the greatest number of Mendelian errors. These included read depth, genotype quality score, and alternate allele ratio. Filtering the variants on these measurements removed ~95% of the Mendelian errors while retaining 80% of the called variants. These filters were applied independently. After filtering, the concordance between identical samples isolated from different sources was 99.99% as compared to 87% before filtering. This high concordance suggests that different sources of DNA can be used in trio studies without affecting the ability to identify causative polymorphisms. To facilitate analysis of next generation sequencing data, we developed the Cincinnati Analytical Suite for Sequencing Informatics (CASSI) to store sequencing files, metadata (eg. relatedness information), file versioning, data filtering, variant annotation, and identify candidate causative polymorphisms that follow either de novo, rare recessive homozygous or compound heterozygous inheritance models. We conclude the data cleaning process improves the signal to noise ratio in terms of variants and facilitates the identification of candidate disease causative polymorphisms.

摘要

下一代测序研究以相对经济和高效的方式生成大量遗传数据,并提供了前所未有的机会来识别导致疾病表型的候选致病变体。这些研究面临的一个挑战是当前技术产生的测序伪影。为了识别和描述区分假阳性变体和真实变体的特性,我们使用从三个来源(血液、口腔细胞和唾液)分离的 DNA 对一个孩子和他的父母(一个三重)进行了测序。三重策略使我们能够识别出不可能从父母那里遗传的(孟德尔错误)并很可能表明是测序伪影的变体。对质量控制测量进行了检查,发现有三个测量值可以识别出最大数量的孟德尔错误。这些包括读取深度、基因型质量得分和替代等位基因比。对这些测量值进行过滤可去除约 95%的孟德尔错误,同时保留 80%的已调用变体。这些过滤器是独立应用的。过滤后,来自不同来源的相同样本之间的一致性为 99.99%,而过滤前为 87%。这种高度一致性表明,在三重研究中可以使用不同来源的 DNA,而不会影响识别致病多态性的能力。为了方便下一代测序数据分析,我们开发了辛辛那提分析测序信息套件 (CASSI),用于存储测序文件、元数据(例如,亲缘关系信息)、文件版本控制、数据过滤、变体注释,并识别遵循从头出现、罕见隐性纯合子或复合杂合子遗传模型的候选致病多态性。我们得出结论,数据清理过程提高了变体的信噪比,并有助于识别候选疾病致病多态性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faeb/3921572/0dc95c62a0a3/fgene-05-00016-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faeb/3921572/d1f559cf84ec/fgene-05-00016-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faeb/3921572/ba4457e56101/fgene-05-00016-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faeb/3921572/d661cddd1af4/fgene-05-00016-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faeb/3921572/28eaef449673/fgene-05-00016-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faeb/3921572/a226e1739b8b/fgene-05-00016-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faeb/3921572/e76ef4d26822/fgene-05-00016-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faeb/3921572/0dc95c62a0a3/fgene-05-00016-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faeb/3921572/d1f559cf84ec/fgene-05-00016-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faeb/3921572/ba4457e56101/fgene-05-00016-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faeb/3921572/d661cddd1af4/fgene-05-00016-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faeb/3921572/28eaef449673/fgene-05-00016-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faeb/3921572/a226e1739b8b/fgene-05-00016-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faeb/3921572/e76ef4d26822/fgene-05-00016-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faeb/3921572/0dc95c62a0a3/fgene-05-00016-g007.jpg

相似文献

1
The struggle to find reliable results in exome sequencing data: filtering out Mendelian errors.在外显子组测序数据中寻找可靠结果的艰难努力:过滤孟德尔错误。
Front Genet. 2014 Feb 12;5:16. doi: 10.3389/fgene.2014.00016. eCollection 2014.
2
Mendelian Inconsistent Signatures from 1314 Ancestrally Diverse Family Trios Distinguish Biological Variation from Sequencing Error.来自1314个具有不同祖先的三联体家庭的孟德尔不一致特征区分了生物学变异与测序错误。
J Comput Biol. 2019 May;26(5):405-419. doi: 10.1089/cmb.2018.0253. Epub 2019 Apr 3.
3
Effective filtering strategies to improve data quality from population-based whole exome sequencing studies.从基于人群的全外显子组测序研究中提高数据质量的有效筛选策略。
BMC Bioinformatics. 2014 May 2;15:125. doi: 10.1186/1471-2105-15-125.
4
exomeSuite: Whole exome sequence variant filtering tool for rapid identification of putative disease causing SNVs/indels.外显子组套件:用于快速识别假定致病单核苷酸变异/插入缺失的全外显子序列变异筛选工具。
Genomics. 2014 Feb-Mar;103(2-3):169-76. doi: 10.1016/j.ygeno.2014.02.006. Epub 2014 Mar 3.
5
FMFilter: A fast model based variant filtering tool.FMFilter:一种基于模型的快速变异过滤工具。
J Biomed Inform. 2016 Apr;60:319-27. doi: 10.1016/j.jbi.2016.02.013. Epub 2016 Feb 27.
6
Next-generation sequencing: impact of exome sequencing in characterizing Mendelian disorders.下一代测序:外显子组测序在孟德尔疾病特征分析中的影响。
J Hum Genet. 2012 Oct;57(10):621-32. doi: 10.1038/jhg.2012.91. Epub 2012 Jul 26.
7
Utility of trio-based exome sequencing in the elucidation of the genetic basis of isolated syndromic intellectual disability: illustrative cases.基于三联体的外显子组测序在阐明孤立性综合征性智力障碍遗传基础中的应用:病例说明
Appl Clin Genet. 2018 Aug 22;11:93-98. doi: 10.2147/TACG.S165799. eCollection 2018.
8
Massively parallel sequencing of the mouse exome to accurately identify rare, induced mutations: an immediate source for thousands of new mouse models.大规模平行测序的老鼠外显子来准确地识别罕见的、诱导的突变:一个即时的来源,成千上万的新的老鼠模型。
Open Biol. 2012 May;2(5):120061. doi: 10.1098/rsob.120061.
9
CompoundHetVIP: Compound Heterozygous Variant Identification Pipeline.复合杂合变异鉴定管道(CompoundHetVIP)。
F1000Res. 2020 Oct 8;9:1211. doi: 10.12688/f1000research.26848.2. eCollection 2020.
10
Effective variant filtering and expected candidate variant yield in studies of rare human disease.罕见人类疾病研究中的有效变异筛选及预期候选变异产出
NPJ Genom Med. 2021 Jul 15;6(1):60. doi: 10.1038/s41525-021-00227-3.

引用本文的文献

1
Exome sequencing of a Portuguese cohort of early-onset Alzheimer's disease implicates the X-linked lysosomal gene GLA.对一组葡萄牙早发性阿尔茨海默病患者进行外显子组测序,结果表明X连锁溶酶体基因GLA与之相关。
Sci Rep. 2025 Apr 4;15(1):11653. doi: 10.1038/s41598-025-95183-8.
2
Shotgun metagenomics reveals interkingdom association between intestinal bacteria and fungi involving competition for nutrients. shotgun 宏基因组学揭示了肠道细菌和真菌之间的跨界关联,涉及对营养物质的竞争。
Microbiome. 2023 Dec 14;11(1):275. doi: 10.1186/s40168-023-01693-w.
3
Whole-exome sequencing reveals PSEN1 and ATP7B combined variants as a possible cause of early-onset Lewy body dementia: a case study of genotype-phenotype correlation.

本文引用的文献

1
The role of replicates for error mitigation in next-generation sequencing.重复在下一代测序中用于减少错误的作用。
Nat Rev Genet. 2014 Jan;15(1):56-62. doi: 10.1038/nrg3655. Epub 2013 Dec 10.
2
Variant callers for next-generation sequencing data: a comparison study.下一代测序数据的变异调用者:一项比较研究。
PLoS One. 2013 Sep 27;8(9):e75619. doi: 10.1371/journal.pone.0075619. eCollection 2013.
3
Comparing a few SNP calling algorithms using low-coverage sequencing data.比较几种使用低覆盖度测序数据的 SNP calling 算法。
全外显子组测序揭示 PSEN1 和 ATP7B 联合变异可能是早发性路易体痴呆的原因:基因型-表型相关性的病例研究。
Neurogenetics. 2022 Oct;23(4):279-283. doi: 10.1007/s10048-022-00699-0. Epub 2022 Sep 17.
4
Exome Sequencing of a Portuguese Cohort of Frontotemporal Dementia Patients: Looking Into the ALS-FTD Continuum.一组葡萄牙额颞叶痴呆患者的外显子组测序:探究肌萎缩侧索硬化症-额颞叶痴呆连续体
Front Neurol. 2022 Jul 7;13:886379. doi: 10.3389/fneur.2022.886379. eCollection 2022.
5
High-throughput method for the hybridisation-based targeted enrichment of long genomic fragments for PacBio third-generation sequencing.用于基于杂交的长基因组片段靶向富集以进行PacBio第三代测序的高通量方法。
NAR Genom Bioinform. 2022 Jul 13;4(3):lqac051. doi: 10.1093/nargab/lqac051. eCollection 2022 Sep.
6
and Variants in Malaysian Neural Tube Defect Families.马来西亚神经管缺陷家族的变异。
Genes (Basel). 2022 May 26;13(6):952. doi: 10.3390/genes13060952.
7
Genetic analysis reveals novel variants for vascular cognitive impairment.遗传分析揭示血管性认知障碍的新变异。
Acta Neurol Scand. 2022 Jul;146(1):42-50. doi: 10.1111/ane.13613. Epub 2022 Mar 20.
8
Rare variants in TP73 in a frontotemporal dementia cohort link this gene with primary progressive aphasia phenotypes.在额颞叶痴呆队列中,TP73 中的罕见变异将该基因与原发性进行性失语症表型联系起来。
Eur J Neurol. 2022 May;29(5):1524-1528. doi: 10.1111/ene.15248. Epub 2022 Jan 21.
9
Desmoplakin and periplakin genetically and functionally contribute to eosinophilic esophagitis.桥粒斑蛋白和周围斑蛋白在遗传学和功能上有助于嗜酸性粒细胞性食管炎。
Nat Commun. 2021 Nov 23;12(1):6795. doi: 10.1038/s41467-021-26939-9.
10
Estimating sequencing error rates using families.利用家系估计测序错误率。
BioData Min. 2021 Apr 23;14(1):27. doi: 10.1186/s13040-021-00259-6.
BMC Bioinformatics. 2013 Sep 17;14:274. doi: 10.1186/1471-2105-14-274.
4
Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing.多种变异calling 管道一致性低:外显子组和基因组测序的实际影响。
Genome Med. 2013 Mar 27;5(3):28. doi: 10.1186/gm432. eCollection 2013.
5
Multiplex targeted sequencing identifies recurrently mutated genes in autism spectrum disorders.多重靶向测序鉴定自闭症谱系障碍中反复突变的基因。
Science. 2012 Dec 21;338(6114):1619-22. doi: 10.1126/science.1227764. Epub 2012 Nov 15.
6
The UCSC Genome Browser database: extensions and updates 2013.UCSC 基因组浏览器数据库:扩展和更新 2013 年版
Nucleic Acids Res. 2013 Jan;41(Database issue):D64-9. doi: 10.1093/nar/gks1048. Epub 2012 Nov 15.
7
Next generation sequence analysis and computational genomics using graphical pipeline workflows.下一代序列分析和使用图形管道工作流的计算基因组学。
Genes (Basel). 2012 Aug 30;3(3):545-75. doi: 10.3390/genes3030545.
8
Next-generation sequencing data interpretation: enhancing reproducibility and accessibility.下一代测序数据解读:提高可重复性和可及性。
Nat Rev Genet. 2012 Sep;13(9):667-72. doi: 10.1038/nrg3305.
9
Limitations of the human reference genome for personalized genomics.人类参考基因组在个性化基因组学中的局限性。
PLoS One. 2012;7(7):e40294. doi: 10.1371/journal.pone.0040294. Epub 2012 Jul 11.
10
De novo germline and postzygotic mutations in AKT3, PIK3R2 and PIK3CA cause a spectrum of related megalencephaly syndromes.AKT3、PIK3R2 和 PIK3CA 中的新生种系和后成体突变导致一系列相关的巨脑畸形综合征。
Nat Genet. 2012 Jun 24;44(8):934-40. doi: 10.1038/ng.2331.