将成对比较问题并行化的新方法，应用于检测全基因组数据中因遗传而相同的片段。

Novel approach for parallelizing pairwise comparison problems as applied to detecting segments identical by decent in whole-genome data.

作者信息

Sapin Emmanuel, Keller Matthew C

机构信息

Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, CO 80309, USA.

Psychology & Neuroscience Department, University of Colorado Boulder, Boulder, CO, USA.

出版信息

Bioinformatics. 2021 Aug 9;37(15):2121-2125. doi: 10.1093/bioinformatics/btab084.

DOI:10.1093/bioinformatics/btab084

PMID:33705528

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8352502/

Abstract

MOTIVATION

Pairwise comparison problems arise in many areas of science. In genomics, datasets are already large and getting larger, and so operations that require pairwise comparisons-either on pairs of SNPs or pairs of individuals-are extremely computationally challenging. We propose a generic algorithm for addressing pairwise comparison problems that breaks a large problem (of order n2 comparisons) into multiple smaller ones (each of order n comparisons), allowing for massive parallelization.

RESULTS

We demonstrated that this approach is very efficient for calling identical by descent (IBD) segments between all pairs of individuals in the UK Biobank dataset, with a 250-fold savings in time and 750-fold savings in memory over the standard approach to detecting such segments across the full dataset. This efficiency should extend to other methods of IBD calling and, more generally, to other pairwise comparison tasks in genomics or other areas of science.

AVAILABILITY AND IMPLEMENTATION

A GitHub page is available at https://github.com/emmanuelsapin with the code to generate data needed for the implementation.

摘要

动机

成对比较问题在许多科学领域都会出现。在基因组学中，数据集已经很大且还在不断增大，因此需要成对比较的操作——无论是对单核苷酸多态性（SNP）对还是个体对进行比较——在计算上都极具挑战性。我们提出了一种通用算法来解决成对比较问题，该算法将一个大问题（规模为(n^2)次比较）分解为多个较小的问题（每个规模为(n)次比较），从而实现大规模并行化。

结果

我们证明，对于在英国生物银行数据集中的所有个体对之间调用同源片段（IBD），这种方法非常高效，与在整个数据集上检测此类片段的标准方法相比，时间节省了250倍，内存节省了750倍。这种效率应能扩展到其他IBD调用方法，更广泛地说，还能扩展到基因组学或其他科学领域的其他成对比较任务。

可用性与实现

可通过https://github.com/emmanuelsapin上的GitHub页面获取实现所需数据生成代码。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a92b/8352502/3e4a955835a4/btab084f1.jpg

相似文献

Novel approach for parallelizing pairwise comparison problems as applied to detecting segments identical by decent in whole-genome data.将成对比较问题并行化的新方法，应用于检测全基因组数据中因遗传而相同的片段。

Bioinformatics. 2021 Aug 9;37(15):2121-2125. doi: 10.1093/bioinformatics/btab084.

IBDkin: fast estimation of kinship coefficients from identity by descent segments.IBDkin：基于降同关系片段的亲缘系数快速估计

Bioinformatics. 2020 Aug 15;36(16):4519-4520. doi: 10.1093/bioinformatics/btaa569.

Selecting Clustering Algorithms for Identity-By-Descent Mapping.选择用于同源定位映射的聚类算法。

Pac Symp Biocomput. 2023;28:121-132.

Rapid detection of identity-by-descent tracts for mega-scale datasets.大规模数据集的同源片段快速检测

Nat Commun. 2021 Jun 10;12(1):3546. doi: 10.1038/s41467-021-22910-w.

Efficient clustering of identity-by-descent between multiple individuals.多个个体之间的血缘关系的高效聚类。

Bioinformatics. 2014 Apr 1;30(7):915-22. doi: 10.1093/bioinformatics/btt734. Epub 2013 Dec 19.

P-smoother: efficient PBWT smoothing of large haplotype panels.P-平滑器：对大型单倍型面板进行高效的基于位置的小波变换平滑处理

Bioinform Adv. 2022 Jun 20;2(1):vbac045. doi: 10.1093/bioadv/vbac045. eCollection 2022.

Learning with multiple pairwise kernels for drug bioactivity prediction.使用多种成对核函数进行药物生物活性预测。

Bioinformatics. 2018 Jul 1;34(13):i509-i518. doi: 10.1093/bioinformatics/bty277.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Biobank-scale inference of multi-individual identity by descent and gene conversion.基于个体血缘关系和基因转换的生物银行规模个体推断。

Am J Hum Genet. 2024 Apr 4;111(4):691-700. doi: 10.1016/j.ajhg.2024.02.015. Epub 2024 Mar 20.

IBD-Groupon: an efficient method for detecting group-wise identity-by-descent regions simultaneously in multiple individuals based on pairwise IBD relationships.IBD-Groupon：一种基于个体间 IBD 关系同时检测多个个体中群组连锁关系的有效方法。

Bioinformatics. 2013 Jul 1;29(13):i162-70. doi: 10.1093/bioinformatics/btt237.

引用本文的文献

Genome assembly composition of the String "ACGT" array: a review of data structure accuracy and performance challenges.字符串“ACGT”阵列的基因组组装组成：数据结构准确性和性能挑战综述

PeerJ Comput Sci. 2023 Jul 13;9:e1180. doi: 10.7717/peerj-cs.1180. eCollection 2023.

MEDICC2: whole-genome doubling aware copy-number phylogenies for cancer evolution.MEDICC2：用于癌症进化的全基因组倍增意识拷贝数系统发育。

Genome Biol. 2022 Nov 14;23(1):241. doi: 10.1186/s13059-022-02794-9.

Current Developments in Detection of Identity-by-Descent Methods and Applications.同源性检测方法的当前发展与应用

Front Genet. 2021 Sep 10;12:722602. doi: 10.3389/fgene.2021.722602. eCollection 2021.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

将成对比较问题并行化的新方法，应用于检测全基因组数据中因遗传而相同的片段。

Novel approach for parallelizing pairwise comparison problems as applied to detecting segments identical by decent in whole-genome data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性与实现

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献