将表型与全基因组进行匹配：从个人基因组计划社区挑战的四次迭代中学到的经验教训。

Matching phenotypes to whole genomes: Lessons learned from four iterations of the personal genome project community challenges.

作者信息

Cai Binghuang, Li Biao, Kiga Nikki, Thusberg Janita, Bergquist Timothy, Chen Yun-Ching, Niknafs Noushin, Carter Hannah, Tokheim Collin, Beleva-Guthrie Violeta, Douville Christopher, Bhattacharya Rohit, Yeo Hui Ting Grace, Fan Jean, Sengupta Sohini, Kim Dewey, Cline Melissa, Turner Tychele, Diekhans Mark, Zaucha Jan, Pal Lipika R, Cao Chen, Yu Chen-Hsin, Yin Yizhou, Carraro Marco, Giollo Manuel, Ferrari Carlo, Leonardi Emanuela, Tosatto Silvio C E, Bobe Jason, Ball Madeleine, Hoskins Roger A, Repo Susanna, Church George, Brenner Steven E, Moult John, Gough Julian, Stanke Mario, Karchin Rachel, Mooney Sean D

机构信息

Department of Biomedical Informatics & Medical Education, University of Washington School of Medicine, Seattle, Washington.

The Buck Institute for Research on Aging, Novato, California.

出版信息

Hum Mutat. 2017 Sep;38(9):1266-1276. doi: 10.1002/humu.23265. Epub 2017 Jun 19.

DOI:10.1002/humu.23265

PMID:28544481

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5645203/

Abstract

The advent of next-generation sequencing has dramatically decreased the cost for whole-genome sequencing and increased the viability for its application in research and clinical care. The Personal Genome Project (PGP) provides unrestricted access to genomes of individuals and their associated phenotypes. This resource enabled the Critical Assessment of Genome Interpretation (CAGI) to create a community challenge to assess the bioinformatics community's ability to predict traits from whole genomes. In the CAGI PGP challenge, researchers were asked to predict whether an individual had a particular trait or profile based on their whole genome. Several approaches were used to assess submissions, including ROC AUC (area under receiver operating characteristic curve), probability rankings, the number of correct predictions, and statistical significance simulations. Overall, we found that prediction of individual traits is difficult, relying on a strong knowledge of trait frequency within the general population, whereas matching genomes to trait profiles relies heavily upon a small number of common traits including ancestry, blood type, and eye color. When a rare genetic disorder is present, profiles can be matched when one or more pathogenic variants are identified. Prediction accuracy has improved substantially over the last 6 years due to improved methodology and a better understanding of features.

摘要

下一代测序技术的出现极大地降低了全基因组测序的成本，并提高了其在研究和临床护理中应用的可行性。个人基因组计划（PGP）提供了对个体基因组及其相关表型的无限制访问。这一资源使得基因组解释关键评估（CAGI）能够发起一项社区挑战，以评估生物信息学社区从全基因组预测性状的能力。在CAGI PGP挑战中，研究人员被要求根据个体的全基因组预测其是否具有特定的性状或特征。使用了几种方法来评估提交的结果，包括ROC AUC（受试者操作特征曲线下的面积）、概率排名、正确预测的数量以及统计显著性模拟。总体而言，我们发现预测个体性状很困难，这依赖于对一般人群中性状频率的深入了解，而将基因组与性状特征进行匹配则严重依赖于少数常见性状，包括祖先、血型和眼睛颜色。当存在罕见的遗传疾病时，当识别出一个或多个致病变异时，就可以进行特征匹配。由于方法的改进和对特征的更好理解，在过去6年中预测准确性有了显著提高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e6c/5645203/d2e0e075ba95/nihms889883f1.jpg

相似文献

Matching phenotypes to whole genomes: Lessons learned from four iterations of the personal genome project community challenges.将表型与全基因组进行匹配：从个人基因组计划社区挑战的四次迭代中学到的经验教训。

Hum Mutat. 2017 Sep;38(9):1266-1276. doi: 10.1002/humu.23265. Epub 2017 Jun 19.

A probabilistic model to predict clinical phenotypic traits from genome sequencing.一种从基因组测序预测临床表型特征的概率模型。

PLoS Comput Biol. 2014 Sep 4;10(9):e1003825. doi: 10.1371/journal.pcbi.1003825. eCollection 2014 Sep.

Deep whole-genome sequencing of 90 Han Chinese genomes.对 90 个汉族个体的全基因组深度测序。

Gigascience. 2017 Sep 1;6(9):1-7. doi: 10.1093/gigascience/gix067.

Matching whole genomes to rare genetic disorders: Identification of potential causative variants using phenotype-weighted knowledge in the CAGI SickKids5 clinical genomes challenge.将全基因组与罕见遗传疾病相匹配：在 CAGI SickKids5 临床基因组挑战中使用表型加权知识鉴定潜在的致病变异。

Hum Mutat. 2020 Feb;41(2):347-362. doi: 10.1002/humu.23933. Epub 2019 Nov 15.

CAGI SickKids challenges: Assessment of phenotype and variant predictions derived from clinical and genomic data of children with undiagnosed diseases.CAGI SickKids 挑战：评估患有不明原因疾病的儿童的临床和基因组数据中得出的表型和变异预测。

Hum Mutat. 2019 Sep;40(9):1373-1391. doi: 10.1002/humu.23874. Epub 2019 Sep 3.

The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes.超过 100 个人类基因组的全基因组序列和经过实验相位的单倍型。

Gigascience. 2016 Oct 11;5(1):42. doi: 10.1186/s13742-016-0148-z.

The Personal Genome Project Canada: findings from whole genome sequences of the inaugural 56 participants.加拿大个人基因组计划：首批 56 名参与者全基因组序列的研究结果。

CMAJ. 2018 Feb 5;190(5):E126-E136. doi: 10.1503/cmaj.171151.

Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project.对罕见基因组项目中罕见病诊断的变异优先级方法的批判性评估。

Hum Genomics. 2024 Apr 29;18(1):44. doi: 10.1186/s40246-024-00604-w.

Enhancing SNV identification in whole-genome sequencing data through the incorporation of known genetic variants into the minimap2 index.通过将已知遗传变异纳入 minimap2 索引来提高全基因组测序数据中 SNV 的识别能力。

BMC Bioinformatics. 2024 Jul 13;25(1):238. doi: 10.1186/s12859-024-05862-y.

GenomeChronicler: The Personal Genome Project UK Genomic Report Generator Pipeline.基因组编年史家：英国个人基因组计划基因组报告生成管道。

Front Genet. 2020 Sep 24;11:518644. doi: 10.3389/fgene.2020.518644. eCollection 2020.

引用本文的文献

CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods.CAGI，即基因组解读的关键评估，旨在评估计算遗传变异解读方法的进展和前景。

Genome Biol. 2024 Feb 22;25(1):53. doi: 10.1186/s13059-023-03113-6.

Evaluation of crowdsourced mortality prediction models as a framework for assessing artificial intelligence in medicine.评估众包死亡率预测模型作为评估医学人工智能的框架。

J Am Med Inform Assoc. 2023 Dec 22;31(1):35-44. doi: 10.1093/jamia/ocad159.

Genome interpretation using in silico predictors of variant impact.使用变异影响的计算机预测因子进行基因组解读。

Hum Genet. 2022 Oct;141(10):1549-1577. doi: 10.1007/s00439-022-02457-6. Epub 2022 Apr 30.

A method to delineate de novo missense variants across pathways prioritizes genes linked to autism.一种跨途径划定从头错义变异的方法优先考虑与自闭症相关的基因。

Sci Transl Med. 2021 May 19;13(594). doi: 10.1126/scitranslmed.abc1739.

Piloting a model-to-data approach to enable predictive analytics in health care through patient mortality prediction.通过患者死亡率预测，引导一种模型到数据的方法，以实现医疗保健领域的预测分析。

J Am Med Inform Assoc. 2020 Jul 1;27(9):1393-1400. doi: 10.1093/jamia/ocaa083.

Hum Mutat. 2019 Sep;40(9):1373-1391. doi: 10.1002/humu.23874. Epub 2019 Sep 3.

Assessment of patient clinical descriptions and pathogenic variants from gene panel sequences in the CAGI-5 intellectual disability challenge.CAGI-5 智力障碍挑战赛中基因panel 序列中患者临床描述和致病变异的评估。

Hum Mutat. 2019 Sep;40(9):1330-1345. doi: 10.1002/humu.23823. Epub 2019 Jul 3.

Reports from CAGI: The Critical Assessment of Genome Interpretation.来自基因组解释关键评估（CAGI）的报告。

Hum Mutat. 2017 Sep;38(9):1039-1041. doi: 10.1002/humu.23290.

本文引用的文献

The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog).新的NHGRI-EBI已发表全基因组关联研究目录（GWAS目录）。

Nucleic Acids Res. 2017 Jan 4;45(D1):D896-D901. doi: 10.1093/nar/gkw1133. Epub 2016 Nov 29.

Assessing the Pathogenicity of Insertion and Deletion Variants with the Variant Effect Scoring Tool (VEST-Indel).使用变异效应评分工具（VEST-Indel）评估插入和缺失变异的致病性。

Hum Mutat. 2016 Jan;37(1):28-35. doi: 10.1002/humu.22911. Epub 2015 Oct 26.

Ten simple rules for a community computational challenge.社区计算挑战的十条简单规则。

PLoS Comput Biol. 2015 Apr 23;11(4):e1004150. doi: 10.1371/journal.pcbi.1004150. eCollection 2015 Apr.

The SUPERFAMILY 1.75 database in 2014: a doubling of data.2014年的超家族1.75数据库：数据量翻倍。

Nucleic Acids Res. 2015 Jan;43(Database issue):D227-33. doi: 10.1093/nar/gku1041. Epub 2014 Nov 20.

A probabilistic model to predict clinical phenotypic traits from genome sequencing.一种从基因组测序预测临床表型特征的概率模型。

PLoS Comput Biol. 2014 Sep 4;10(9):e1003825. doi: 10.1371/journal.pcbi.1003825. eCollection 2014 Sep.

Harvard Personal Genome Project: lessons from participatory public research.哈佛个人基因组计划：参与式公共研究的经验教训。

Genome Med. 2014 Feb 28;6(2):10. doi: 10.1186/gm527.

Identifying Mendelian disease genes with the variant effect scoring tool.使用变异效应评分工具鉴定孟德尔疾病基因。

BMC Genomics. 2013;14 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2164-14-S3-S3. Epub 2013 May 28.

Predicting functional effect of human missense mutations using PolyPhen-2.使用PolyPhen-2预测人类错义突变的功能效应。

Curr Protoc Hum Genet. 2013 Jan;Chapter 7:Unit7.20. doi: 10.1002/0471142905.hg0720s76.

DcGO: database of domain-centric ontologies on functions, phenotypes, diseases and more.DcGO：以功能、表型、疾病等为中心的本体数据库。

Nucleic Acids Res. 2013 Jan;41(Database issue):D536-44. doi: 10.1093/nar/gks1080. Epub 2012 Nov 17.

Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models.使用隐马尔可夫模型预测氨基酸取代的功能、分子和表型后果。

Hum Mutat. 2013 Jan;34(1):57-65. doi: 10.1002/humu.22225. Epub 2012 Nov 2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验