Suppr超能文献

一种用于基因定位的最近邻端算法。

A nearest-neighboring-end algorithm for genetic mapping.

作者信息

Crane Charles F, Crane Yan M

机构信息

USDA-ARS and Department of Botany and Plant Pathology, Purdue University, West Lafayette, IN 47907, USA.

出版信息

Bioinformatics. 2005 Apr 15;21(8):1579-91. doi: 10.1093/bioinformatics/bti164. Epub 2004 Nov 25.

Abstract

MOTIVATION

High-throughput methods are beginning to make possible the genotyping of thousands of loci in thousands of individuals, which could be useful for tightly associating phenotypes to candidate loci. Current mapping algorithms cannot handle so many data without building hierarchies of framework maps.

RESULTS

A version of Kruskal's minimum spanning tree algorithm can solve any genetic mapping problem that can be stated as marker deletion from a set of linkage groups. These include backcross, recombinant inbred, haploid and double-cross recombinational populations, in addition to conventional deletion and radiation hybrid populations. The algorithm progressively joins linkage groups at increasing recombination fractions between terminal markers, and attempts to recognize and correct erroneous joins at peaks in recombination fraction. The algorithm is O (mn3) for m individuals and n markers, but the mean run time scales close to mn2. It is amenable to parallel processing and has recovered true map order in simulations of large backcross, recombinant inbred and deletion populations with up to 37,005 markers. Simulations were used to investigate map accuracy in response to population size, allelic dominance, segregation distortion, missing data and random typing errors. It produced accurate maps when marker distribution was sufficiently uniform, although segregation distortion could induce translocated marker orders. The algorithm was also used to map 1003 loci in the F7 ITMI population of bread wheat, Triticum aestivum L. emend Thell., where it shortened an existing standard map by 16%, but it failed to associate blocks of markers properly across gaps within linkage groups. This was because it depends upon the rankings of recombination fractions at individual markers, and is susceptible to sampling error, typing error and joint selection involving the terminal markers of nearly finished linkage groups. Therefore, the current form of the algorithm is useful mainly to improve local marker ordering in linkage groups obtained in other ways.

AVAILABILITY

The source code and supplemental data are http://www.iubio.bio.indiana.edu/soft/molbio/qtl/flipper/

CONTACT

ccrane@purdue.edu.

摘要

动机

高通量方法开始使对数以千计个体中的数千个基因座进行基因分型成为可能,这对于将表型与候选基因座紧密关联可能是有用的。当前的定位算法在不构建框架图谱层次结构的情况下无法处理如此多的数据。

结果

克鲁斯卡尔最小生成树算法的一个版本可以解决任何可表述为从一组连锁群中删除标记的遗传定位问题。这些群体包括回交群体、重组近交群体、单倍体和双交重组群体,以及传统的删除群体和辐射杂种群体。该算法在终端标记之间以递增的重组分数逐步连接连锁群,并尝试在重组分数的峰值处识别和纠正错误连接。对于m个个体和n个标记,该算法的时间复杂度为O(mn³),但平均运行时间接近mn²。它适合并行处理,并且在多达37005个标记的大型回交、重组近交和删除群体的模拟中恢复了真实的图谱顺序。通过模拟研究了群体大小、等位基因显性、分离畸变、缺失数据和随机分型错误对图谱准确性的影响。当标记分布足够均匀时,它能产生准确的图谱,尽管分离畸变可能导致标记顺序易位。该算法还用于对普通小麦(Triticum aestivum L. emend Thell.)的F7 ITMI群体中的1003个基因座进行定位,在此过程中它将现有的标准图谱缩短了16%,但它未能在连锁群内的间隙处正确关联标记块。这是因为它依赖于各个标记处重组分数的排序,并且容易受到抽样误差、分型错误以及涉及几乎完成的连锁群终端标记的联合选择的影响。因此,该算法的当前形式主要用于改善通过其他方式获得的连锁群中的局部标记排序。

可用性

源代码和补充数据可在http://www.iubio.bio.indiana.edu/soft/molbio/qtl/flipper/获取。

联系方式

ccrane@purdue.edu

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验