School of Life Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.
PLoS One. 2012;7(7):e42154. doi: 10.1371/journal.pone.0042154. Epub 2012 Jul 27.
The composition vector (CV) method has been proved to be a reliable and fast alignment-free method to analyze large COI barcoding data. In this study, we modify this method for analyzing multi-gene datasets for plant DNA barcoding. The modified method includes an adjustable-weighted algorithm for the vector distance according to the ratio in sequence length of the candidate genes for each pair of taxa.
METHODOLOGY/PRINCIPAL FINDINGS: Three datasets, matK+rbcL dataset with 2,083 sequences, matK+rbcL dataset with 397 sequences and matK+rbcL+trnH-psbA dataset with 397 sequences, were tested. We showed that the success rates of grouping sequences at the genus/species level based on this modified CV approach are always higher than those based on the traditional K2P/NJ method. For the matK+rbcL datasets, the modified CV approach outperformed the K2P-NJ approach by 7.9% in both the 2,083-sequence and 397-sequence datasets, and for the matK+rbcL+trnH-psbA dataset, the CV approach outperformed the traditional approach by 16.7%.
We conclude that the modified CV approach is an efficient method for analyzing large multi-gene datasets for plant DNA barcoding. Source code, implemented in C++ and supported on MS Windows, is freely available for download at http://math.xtu.edu.cn/myphp/math/research/source/Barcode_source_codes.zip.
组成向量(CV)方法已被证明是一种可靠且快速的无比对方法,可用于分析大型 COI 条码数据。在本研究中,我们修改了该方法,用于分析植物 DNA 条码的多基因数据集。修改后的方法包括根据候选基因序列长度比为每对分类单元的向量距离设置可调权重算法。
方法/主要发现:我们测试了三个数据集,即包含 2083 条序列的 matK+rbcL 数据集、包含 397 条序列的 matK+rbcL 数据集和包含 397 条序列的 matK+rbcL+trnH-psbA 数据集。我们表明,基于此修改后的 CV 方法,在属/种水平上对序列进行分组的成功率始终高于基于传统 K2P/NJ 方法的成功率。对于 matK+rbcL 数据集,修改后的 CV 方法在 2083 条序列和 397 条序列数据集中分别比 K2P-NJ 方法高出 7.9%,而对于 matK+rbcL+trnH-psbA 数据集,CV 方法比传统方法高出 16.7%。
我们得出结论,修改后的 CV 方法是分析植物 DNA 条码大型多基因数据集的有效方法。源代码,用 C++实现并支持 MS Windows,可在 http://math.xtu.edu.cn/myphp/math/research/source/Barcode_source_codes.zip 免费下载。