Schwartz Russell
Department of Biological Sciences and School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
Proc IEEE Comput Syst Bioinform Conf. 2004:90-7.
There is considerable interest in computational methods to assist in the use of genetic polymorphism data for locating disease-related genes. Haplotypes, contiguous sets of correlated variants, may provide a means of reducing the difficulty of the data analysis problems involved. The field to date has been dominated by methods based on the "haplotype block" hypothesis, which assumes discrete population-wide boundaries between conserved genetic segments, but there is strong reason to believe that haplotype blocks do not fully capture true haplotype conservation patterns. In this paper, we address the computational challenges of using a more flexible, block-free representation of haplotype structure called the "haplotype motif" model for downstream analysis problems. We develop algorithms for htSNP selection and missing data inference using this more generalized model of sequence conservation. Application to a dataset from the literature demonstrates the practical value of these block-free methods.
人们对使用计算方法辅助利用基因多态性数据定位疾病相关基因有着浓厚兴趣。单倍型,即相关变异的连续集合,可能提供一种降低所涉及数据分析问题难度的方法。迄今为止,该领域一直由基于“单倍型块”假说的方法主导,该假说假定保守遗传片段之间存在离散的全人群边界,但有充分理由相信单倍型块并不能完全捕捉真正的单倍型保守模式。在本文中,我们解决了使用一种更灵活、无块的单倍型结构表示法(称为“单倍型基序”模型)来解决下游分析问题的计算挑战。我们开发了使用这种更通用的序列保守模型进行htSNP选择和缺失数据推断的算法。应用于文献中的一个数据集证明了这些无块方法的实用价值。