Albert Paul S, Hunsberger Sally A, Hu Nan, Taylor Philip R
Biometric Research Branch, National Cancer Institute, 6130 Executive Blvd, Room 8136, Bethesda, MD 20892, USA.
Biostatistics. 2004 Oct;5(4):515-29. doi: 10.1093/biostatistics/kxh005.
Identifying changepoints is an important problem in molecular genetics. Our motivating example is from cancer genetics where interest focuses on identifying areas of a chromosome with an increased likelihood of a tumor suppressor gene. Loss of heterozygosity (LOH) is a binary measure of allelic loss in which abrupt changes in LOH frequency along the chromosome may identify boundaries indicative of a region containing a tumor suppressor gene. Our interest was on testing for the presence of multiple changepoints in order to identify regions of increased LOH frequency. A complicating factor is the substantial heterogeneity in LOH frequency across patients, where some patients have a very high LOH frequency while others have a low frequency. We develop a procedure for identifying multiple changepoints in heterogeneous binary data. We propose both approximate and full maximum-likelihood approaches and compare these two approaches with a naive approach in which we ignore the heterogeneity in the binary data. The methodology is used to estimate the pattern in LOH frequency on chromosome 13 in esophageal cancer patients and to isolate an area of inflated LOH frequency on chromosome 13 which may contain a tumor suppressor gene. Using simulations, we show that our approach works well and that it is robust to departures from some key modeling assumptions.
识别变化点是分子遗传学中的一个重要问题。我们的动机示例来自癌症遗传学,其中关注点在于识别染色体上肿瘤抑制基因可能性增加的区域。杂合性缺失(LOH)是等位基因缺失的二元度量,其中沿染色体的LOH频率的突然变化可能识别出指示包含肿瘤抑制基因区域的边界。我们感兴趣的是测试多个变化点的存在,以便识别LOH频率增加的区域。一个复杂因素是患者之间LOH频率存在很大的异质性,一些患者的LOH频率非常高,而另一些患者的频率则很低。我们开发了一种在异质二元数据中识别多个变化点的方法。我们提出了近似和完全最大似然方法,并将这两种方法与一种简单方法进行比较,在简单方法中我们忽略二元数据中的异质性。该方法用于估计食管癌患者13号染色体上LOH频率的模式,并分离出13号染色体上LOH频率升高的区域,该区域可能包含一个肿瘤抑制基因。通过模拟,我们表明我们的方法效果良好,并且对偏离一些关键建模假设具有鲁棒性。