Hassanin Alexandre, Bonillo Céline, Nguyen Bui Xuan, Cruaud Corinne
Département Systématique et Evolution, UMR 7205--Origine, Structure et Evolution de la Biodiversité, Muséum national d'Histoire naturelle (MNHN), Paris, France.
Mitochondrial DNA. 2010 Jun;21(3-4):68-76. doi: 10.3109/19401736.2010.490583.
In the present study, we amplified and sequenced the complete mitochondrial genome from a Vietnamese domestic goat (Capra hircus). The data were compared with mtDNA sequences available in the nucleotide databases.
The results revealed many problems in the goat mitochondrial reference genome (GenBank accession number NC_005044). Firstly, the authors did not sequence the complete genome, simply 44.5% of its total length. Secondly, two fragments (representing 1201 and 2384 nt) contained an unusually high percentage of sequencing errors. Thirdly, a segment of 1881 nt, covering most of nd5 and the 5' part of nd6, was shown to be a nuclear sequence of mitochondrial origin (Numt). Surprisingly, a similar Numt was also detected in four other goat mitochondrial genomes available in GenBank (GU22978-81). Two primers were designed specially to amplify approximately 960 nt of the Numt identified in goat mtDNA genomes. After cloning, two Numts were detected for C. hircus. Several Numts, most of them with stop codon or frameshift mutations, were also found in Hemitragus jemlahicus (Himalayan tahr) and Pseudois nayaur (bharal). Phylogenetic analyses suggest that a nuclear integration occurred in the common ancestor of Ammotragus, Arabitragus, Capra, Hemitragus and Pseudois, followed by several subsequent duplication events.
As poor-quality sequences can produce misleading interpretations of both phylogeny and molecular evolution, we propose including a new link to each accession number in the nucleotide databases, named "external expertise", which could be openly and continually updated by non-anonymous searchers in order to validate good-quality data, or, conversely, to indicate possible problems in the sequence, such as DNA contamination or sequencing errors. This information could prove very useful over time to select good-quality sequences for in silico analyses.
在本研究中,我们扩增并测序了一只越南家山羊(Capra hircus)的完整线粒体基因组。将这些数据与核苷酸数据库中现有的线粒体DNA序列进行了比较。
结果揭示了山羊线粒体参考基因组(GenBank登录号NC_005044)存在的诸多问题。首先,作者并未对完整基因组进行测序,仅测序了其全长的44.5%。其次,两个片段(分别为1201和2384 nt)包含异常高比例的测序错误。第三,一段1881 nt的片段,覆盖了大部分nd5和nd6的5'部分,被证明是线粒体起源的核序列(Numt)。令人惊讶的是,在GenBank中其他四个可用的山羊线粒体基因组(GU22978 - 81)中也检测到了类似的Numt。专门设计了两条引物来扩增山羊线粒体DNA基因组中鉴定出的约960 nt的Numt。克隆后,在山羊中检测到了两个Numt。在喜马拉雅塔尔羊(Hemitragus jemlahicus)和岩羊(Pseudois nayaur)中也发现了几个Numt,其中大多数带有终止密码子或移码突变。系统发育分析表明,在旋角山羊、阿拉伯山羊、山羊属、喜马拉雅塔尔羊和岩羊的共同祖先中发生了一次核整合,随后又发生了几次重复事件。
由于质量差的序列可能会对系统发育和分子进化产生误导性解释,我们建议在核苷酸数据库中为每个登录号添加一个名为“外部专业知识”的新链接,非匿名搜索者可以公开且持续地更新该链接,以验证高质量数据,或者相反,指出序列中可能存在的问题,如DNA污染或测序错误。随着时间的推移,这些信息对于选择高质量序列进行计算机分析可能会非常有用。