Salamon H, Tarhio J, Rønningen K, Thomson G
Department of Integrative Biology, University of California, Berkeley 94720-3140, USA.
J Comput Biol. 1996 Fall;3(3):407-23. doi: 10.1089/cmb.1996.3.407.
The problem of defining combinations of variants unique to a sequence is efficiently addressed as a set covering computation. The unique-combinations method is introduced, which identifies patterns in biological sequence data that distinguish a sequence from a group of other sequences. This method is further developed to describe features consistently present in one group of sequences but not in a second group. The approach is incorporated into a novel analytical tool, designed for use in studies of polymorphic sequence data, such as mitochondrial, human leukocyte antigen (HLA), or viral pathogen sequences. The unique combinations method is well suited to applications in medical genetics and evolutionary genetics. An example implementation of the unique-combinations method yields greatly improved risk assessment for insulin-dependent diabetes mellitus (IDDM) from amino acid patterns isolated in an analysis of HLA class II DQA1-DQB1 patient and control genotypes.
将序列特有的变异组合进行定义的问题,作为一种集合覆盖计算得到了有效解决。引入了独特组合方法,该方法可识别生物序列数据中能将一个序列与一组其他序列区分开来的模式。此方法进一步发展以描述在一组序列中始终存在而在另一组序列中不存在的特征。该方法被整合到一种新颖的分析工具中,设计用于多态性序列数据的研究,如线粒体、人类白细胞抗原(HLA)或病毒病原体序列。独特组合方法非常适合医学遗传学和进化遗传学中的应用。独特组合方法的一个示例实现,通过对HLA II类DQA1 - DQB1患者和对照基因型分析中分离出的氨基酸模式,大大改进了胰岛素依赖型糖尿病(IDDM)的风险评估。