Suppr超能文献

克罗恩病风险预测——外显子组数据的最佳实践与陷阱

Crohn disease risk prediction-Best practices and pitfalls with exome data.

作者信息

Giollo Manuel, Jones David T, Carraro Marco, Leonardi Emanuela, Ferrari Carlo, Tosatto Silvio C E

机构信息

Institute of Structural and Molecular Biology, University College London, London, United Kingdom.

Department of Biomedical Sciences, University of Padova, Padova, Italy.

出版信息

Hum Mutat. 2017 Sep;38(9):1193-1200. doi: 10.1002/humu.23177. Epub 2017 Mar 21.

Abstract

The Critical Assessment of Genome Interpretation (CAGI) experiment is the first attempt to evaluate the state-of-the-art in genetic data interpretation. Among the proposed challenges, Crohn disease (CD) risk prediction has become the most classic problem spanning three editions. The scientific question is very hard: can anybody assess the risk to develop CD given the exome data alone? This is one of the ultimate goals of genetic analysis, which motivated most CAGI participants to look for powerful new methods. In the 2016 CD challenge, we implemented all the best methods proposed in the past editions. This resulted in 10 algorithms, which were evaluated fairly by CAGI organizers. We also used all the data available from CAGI 11 and 13 to maximize the amount of training samples. The most effective algorithms used known genes associated with CD from the literature. No method could evaluate effectively the importance of unannotated variants by using heuristics. As a downside, all CD datasets were strongly affected by sample stratification. This affected the performance reported by assessors. Therefore, we expect that future datasets will be normalized in order to remove population effects. This will improve methods comparison and promote algorithms focused on causal variants discovery.

摘要

基因组解读关键评估(CAGI)实验是评估遗传数据解读领域当前技术水平的首次尝试。在提出的诸多挑战中,克罗恩病(CD)风险预测已成为贯穿三个版本的最经典问题。科学问题极具难度:仅根据外显子组数据,有人能评估患CD的风险吗?这是遗传分析的最终目标之一,促使大多数CAGI参与者寻找强大的新方法。在2016年的CD挑战中,我们实施了过往版本中提出的所有最佳方法。这产生了10种算法,由CAGI组织者进行了公平评估。我们还使用了来自CAGI 11和13的所有可用数据,以最大化训练样本量。最有效的算法使用了文献中已知的与CD相关的基因。没有方法能够通过启发式方法有效评估未注释变异的重要性。不利的一面是,所有CD数据集都受到样本分层的强烈影响。这影响了评估者报告的性能。因此,我们期望未来的数据集能够进行标准化,以消除群体效应。这将改善方法比较,并推动专注于因果变异发现的算法发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c703/5509518/590bb197bf6c/nihms845020f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验