Ratnakumar Abhirami, Barris Wesley, McWilliam Sean, Brauning Rudiger, McEwan John C, Snelling Warren M, Dalrymple Brian P
CSIRO Livestock Industries, 306 Carmody Road, St. Lucia, QLD 4067, Australia.
BMC Genomics. 2009 Jan 23;10:46. doi: 10.1186/1471-2164-10-46.
In large genomics projects involving many different types of analyses of bacterial artificial chromosomes (BACs), such as fingerprinting, end sequencing (BES) and full BAC sequencing there are many opportunities for the identities of BACs to become confused. However, by comparing the results from the different analyses, inconsistencies can be identified and a set of high integrity BACs preferred for future research can be defined.
The location of each bovine BAC in the BAC fingerprint-based genome map and in the genome assembly were compared based on the reported BESs, and for a smaller number of BACs the full sequence. BACs with consistent positions in all three datasets, or if the full sequence was not available, for both the fingerprint map and BES-based alignments, were deemed to be correctly positioned. BACs with consistent BES-based and fingerprint-based locations, but with conflicting locations based on the fully sequenced BAC, appeared to have been misidentified during sequencing, and included a number of apparently swapped BACs. Inconsistencies between BES-based and fingerprint map positions identified thirty one plates from the CHORI-240 library that appear to have suffered substantial systematic problems during the end-sequencing of the BACs. No systematic problems were identified in the fingerprinting of the BACs. Analysis of BACs overlapping in the assembly identified a small overrepresentation of clones with substantial overlap in the library and a substantial enrichment of highly overlapping BACs on the same plate in the CHORI-240 library. More than half of these BACs appear to have been present as duplicates on the original BAC-library plates and thus should be avoided in subsequent projects.
Our analysis shows that approximately 95% of the bovine CHORI-240 library clones with both a BAC fingerprint and two BESs mapping to the genome in the expected orientations (approximately 27% of all BACs) have consistent locations in the BAC fingerprint map and the genome assembly. We have developed a broadly applicable methodology for checking the integrity of BAC-based datasets even where only incomplete and partially assembled genomic sequence is available.
在涉及细菌人工染色体(BAC)多种不同类型分析的大型基因组项目中,如指纹图谱分析、末端测序(BES)和完整BAC测序,BAC的身份有很多混淆的机会。然而,通过比较不同分析的结果,可以识别出不一致之处,并定义一组用于未来研究的高完整性BAC。
根据报告的BES,比较了每个牛BAC在基于BAC指纹图谱的基因组图谱和基因组组装中的位置,对于较少数量的BAC还比较了完整序列。在所有三个数据集中位置一致的BAC,或者如果没有完整序列,则在指纹图谱和基于BES的比对中位置一致的BAC,被认为是定位正确的。基于BES和指纹图谱的位置一致,但基于完全测序的BAC位置冲突的BAC,在测序过程中似乎被错误识别,包括一些明显互换的BAC。基于BES的位置和指纹图谱位置之间的不一致,识别出CHORI-240文库中的31个平板在BAC末端测序过程中似乎存在严重的系统问题。在BAC指纹图谱分析中未发现系统问题。对组装中重叠的BAC进行分析,发现在文库中具有大量重叠的克隆有少量过度代表,并且在CHORI-240文库中同一平板上高度重叠的BAC大量富集。这些BAC中超过一半在原始BAC文库平板上似乎是重复存在的,因此在后续项目中应避免使用。
我们的分析表明,约95%具有BAC指纹且两个BES以预期方向映射到基因组的牛CHORI-240文库克隆(约占所有BAC的27%)在BAC指纹图谱和基因组组装中位置一致。我们已经开发出一种广泛适用的方法,用于检查基于BAC的数据集的完整性,即使只有不完整和部分组装的基因组序列可用。