Ecological Genomics Institute, Division of Biology, Kansas State University, Manhattan, KS 66506, USA.
BMC Bioinformatics. 2011 May 21;12:183. doi: 10.1186/1471-2105-12-183.
Hybridization of heterologous (non-specific) nucleic acids onto arrays designed for model-organisms has been proposed as a viable genomic resource for estimating sequence variation and gene expression in non-model organisms. However, conventional methods of normalization that assume equivalent distributions (such as quantile normalization) are inappropriate when applied to non-specific (heterologous) hybridization. We propose an algorithm for normalizing and centering intensity data from heterologous hybridization that makes no prior assumptions of distribution, reduces the false appearance of homology, and provides a way for researchers to confirm whether heterologous hybridization is suitable.
Data are normalized by adjusting for Gibbs free energy binding, and centered by adjusting for the median of a common set of control probes assumed to be equivalently dissimilar for all species. This procedure was compared to existing approaches and found to be as successful as Loess normalization at detecting sequence variations (deletions) and even more successful than quantile normalization at reducing the accumulation of false positive probe matches between two related nematode species, Caenorhabditis elegans and C. briggsae. Despite the improvements, we still found that probe fluorescence intensity was too poorly correlated with sequence similarity to result in reliable detection of matching probe sequence.
Cross-species hybridizations can be a way to adapt genome-enabled tools for closely related non-model organisms, but data must be appropriately normalized and centered in a way that accommodates hybridization of nucleic acids with diverged sequence. For short, 25-mer probes, hybridization intensity alone may be insufficiently correlated with sequence similarity to allow reliable inference of homology at the probe level.
将异源(非特异性)核酸杂交到为模式生物设计的阵列上,已被提议作为一种可行的基因组资源,用于估计非模式生物中的序列变异和基因表达。然而,当应用于非特异性(异源)杂交时,假设等效分布的常规标准化方法(如分位数标准化)是不合适的。我们提出了一种用于非特异性杂交强度数据归一化和中心化的算法,该算法不做任何关于分布的先验假设,减少了同源性的虚假出现,并为研究人员提供了一种方法来确认异源杂交是否合适。
通过调整吉布斯自由能结合来归一化数据,并通过调整假定所有物种都具有相同差异的一组常见对照探针的中位数来中心化数据。该方法与现有方法进行了比较,发现它在检测序列变异(缺失)方面与 Loess 归一化一样成功,甚至比分位数归一化更成功,减少了两种相关线虫物种秀丽隐杆线虫和 C. briggsae 之间错误的探针匹配的积累。尽管有了这些改进,我们仍然发现探针荧光强度与序列相似性的相关性太差,以至于无法可靠地检测到匹配探针序列。
跨物种杂交可以成为为密切相关的非模式生物适应基因组工具的一种方法,但数据必须以适应具有不同序列的核酸杂交的方式进行适当的归一化和中心化。对于短的 25 -mer 探针,杂交强度本身与序列相似性的相关性可能不足以可靠地推断探针水平的同源性。