Institute for Genomics, Biocomputing & Biotechnology, Mississippi State University, Mississippi State, MS 39762, USA.
National Institute of Molecular Biology & Biotechnology, National Science Complex, College of Science, University of the Philippines, Diliman, Quezon City, Philippines.
Gene. 2018 Jul 15;663:165-177. doi: 10.1016/j.gene.2018.04.024. Epub 2018 Apr 12.
Loblolly pine (LP; Pinus taeda L.) is an economically and ecologically important tree in the southeastern U.S. To advance understanding of the loblolly pine (LP; Pinus taeda L.) genome, we sequenced and analyzed 100 BAC clones and performed a Cot analysis. The Cot analysis indicates that the genome is composed of 57, 24, and 10% highly-repetitive, moderately-repetitive, and single/low-copy sequences, respectively (the remaining 9% of the genome is a combination of fold back and damaged DNA). Although single/low-copy DNA only accounts for 10% of the LP genome, the amount of single/low-copy DNA in LP is still 14 times the size of the Arabidopsis genome. Since gene numbers in LP are similar to those in Arabidopsis, much of the single/low-copy DNA of LP would appear to be composed of DNA that is both gene- and repeat-poor. Macroarrays prepared from a LP bacterial artificial chromosome (BAC) library were hybridized with probes designed from cell wall synthesis/wood development cDNAs, and 50 of the "targeted" clones were selected for further analysis. An additional 25 clones were selected because they contained few repeats, while 25 more clones were selected at random. The 100 BAC clones were Sanger sequenced and assembled. Of the targeted BACs, 80% contained all or part of the cDNA used to target them. One targeted BAC was found to contain fungal DNA and was eliminated from further analysis. Combinations of similarity-based and ab initio gene prediction approaches were utilized to identify and characterize potential coding regions in the 99 BACs containing LP DNA. From this analysis, we identified 154 gene models (GMs) representing both putative protein-coding genes and likely pseudogenes. Ten of the GMs (all of which were specifically targeted) had enough support to be classified as intact genes. Interestingly, the 154 GMs had statistically indistinguishable (α = 0.05) distributions in the targeted and random BAC clones (15.18 and 12.61 GM/Mb, respectively), whereas the low-repeat BACs contained significantly fewer GMs (7.08 GM/Mb). However, when GM length was considered, the targeted BACs had a significantly greater percentage of their length in GMs (3.26%) when compared to random (1.63%) and low-repeat (0.62%) BACs. The results of our study provide insight into LP evolution and inform ongoing efforts to produce a reference genome sequence for LP, while characterization of genes involved in cell wall production highlights carbon metabolism pathways that can be leveraged for increasing wood production.
火炬松(LP;Pinus taeda L.)是美国东南部具有重要经济和生态意义的树种。为了深入了解火炬松(LP;Pinus taeda L.)基因组,我们对 100 个 BAC 克隆进行了测序和分析,并进行了 Cot 分析。Cot 分析表明,基因组由 57%、24%和 10%的高度重复、中度重复和单/低拷贝序列组成,分别为(基因组的其余 9%是回折和受损 DNA 的组合)。尽管单/低拷贝 DNA 仅占 LP 基因组的 10%,但 LP 中单/低拷贝 DNA 的数量仍然是拟南芥基因组的 14 倍。由于 LP 中的基因数量与拟南芥相似,因此 LP 的大部分单/低拷贝 DNA 似乎由基因和重复均较少的 DNA 组成。从 LP 细菌人工染色体(BAC)文库制备的宏阵列与细胞壁合成/木材发育 cDNA 设计的探针杂交,选择了 50 个“靶向”克隆进行进一步分析。另外选择了 25 个克隆,因为它们所含的重复较少,而另外 25 个克隆则随机选择。100 个 BAC 克隆进行了 Sanger 测序和组装。在靶向 BAC 中,80%包含用于靶向它们的 cDNA 的全部或部分。一个靶向 BAC 被发现含有真菌 DNA,因此被排除在进一步分析之外。基于相似性的组合和从头预测方法被用来识别和描述在含有 LP DNA 的 99 个 BAC 中潜在的编码区域。通过这项分析,我们鉴定了 154 个基因模型(GMs),它们代表了潜在的蛋白编码基因和可能的假基因。10 个 GM(全部为专门靶向)具有足够的支持,被归类为完整基因。有趣的是,154 个 GMs 在靶向和随机 BAC 克隆中的分布具有统计学上的不可区分性(α=0.05)(分别为 15.18 和 12.61 GM/Mb),而低重复 BAC 中的 GM 数量明显较少(7.08 GM/Mb)。然而,当考虑 GM 长度时,与随机(1.63%)和低重复(0.62%)BAC 相比,靶向 BAC 中 GM 所占长度的百分比显著更高(3.26%)。我们研究的结果提供了对 LP 进化的深入了解,并为正在进行的生产 LP 参考基因组序列的工作提供了信息,而对细胞壁产生相关基因的描述突出了可以用于提高木材产量的碳代谢途径。