利用线粒体DNA序列估计EST中的单核苷酸多态性参数。

Using mtDNA sequences to estimate SNP parameters in ESTs.

作者信息

Reed Kent M

机构信息

Department of Veterinary and Biomedical Sciences, College of Veterinary Medicine, University of Minnesota, St. Paul, Minnesota 55108, USA.

出版信息

Anim Biotechnol. 2008;19(3):166-77. doi: 10.1080/10495390802170916.

DOI:10.1080/10495390802170916

PMID:18607789

Abstract

Discovery of single nucleotide polymorphisms (SNPs) requires analysis of redundant sequences such as those available in large public databases. The ability to detect SNPs, especially those of low frequency, is dependent on the depth and scale of the discovery effort. Large numbers of SNPs have been identified by mining large-scale EST surveys and whole genome sequencing projects. These surveys however are subject to ascertainment bias and the inherent errors in large-scale single pass sequencing efforts. For example, the number of steps involved in the construction and sequencing of cDNA libraries make ESTs highly error prone, resulting in an increased frequency of nonvalid SNPs obtained in these surveys. Sequences of mtDNA genes are often incorporated into cDNA libraries as an artifact of the library construction process and are typically either subtracted from cDNA libraries or are considered superfluous when evaluating the information content of EST datasets. Sequences of mtDNA genes provide a unique resource for the analysis of SNP parameters in EST projects. This study uses sequences from four turkey muscle cDNA libraries to demonstrate how mtDNA sequences gleaned from collections of ESTs can be used to estimate SNP parameters and thus help predict the validity of SNPs.

摘要

单核苷酸多态性（SNP）的发现需要对冗余序列进行分析，如大型公共数据库中可获取的那些序列。检测SNP的能力，尤其是低频SNP的检测能力，取决于发现工作的深度和规模。通过挖掘大规模EST调查和全基因组测序项目，已经鉴定出大量的SNP。然而，这些调查存在确定偏倚以及大规模单通道测序工作中固有的误差。例如，cDNA文库构建和测序所涉及的步骤数量使得EST极易出错，导致在这些调查中获得的无效SNP频率增加。线粒体DNA（mtDNA）基因序列常常作为文库构建过程的假象被纳入cDNA文库，并且在评估EST数据集的信息内容时，通常会从cDNA文库中去除这些序列或者将其视为多余。mtDNA基因序列为EST项目中SNP参数的分析提供了独特的资源。本研究使用来自四个火鸡肌肉cDNA文库的序列，来证明从EST集合中收集到的mtDNA序列如何能够用于估计SNP参数，从而有助于预测SNP的有效性。