Ma Yanyuan, Wang Yuanjia
Texas A&M University, College Station, USA.
Columbia University, New York, USA.
J R Stat Soc Ser C Appl Stat. 2014 Jan;63(1):1-23. doi: 10.1111/rssc.12025. Epub 2013 Aug 8.
We consider non-parametric estimation of disease onset distribution functions in multiple populations by using censored data with unknown population identifiers. The problem is motivated from studies aiming at estimating the age-specific disease risk distribution in deleterious mutation carriers for genetic counselling and design of therapeutic intervention trials to modify disease progression (i.e. to slow down the development of symptoms and to delay the onset of disease). In some of these studies, the distribution of disease risk in participants assumes a mixture form. Although the population identifiers are missing, study design and scientific knowledge allow calculation of the probability of a subject belonging to each population. We propose a general family of weighted least squares estimators and show that existing consistent non-parametric methods belong to this family. We identify a computationally effortless estimator in the family, study its asymptotic properties and show its significant gain in efficiency compared with the existing estimators in the literature. The application to a large genetic epidemiological study of Huntington's disease reveals information on the age-at-onset distribution of Huntington's disease which sheds light on some clinical hypotheses.
我们考虑通过使用带有未知群体标识符的删失数据,对多个群体中疾病发病分布函数进行非参数估计。该问题源于旨在估计有害突变携带者中特定年龄疾病风险分布的研究,以进行遗传咨询和设计治疗干预试验来改变疾病进展(即减缓症状发展并延迟疾病发作)。在其中一些研究中,参与者的疾病风险分布呈混合形式。尽管群体标识符缺失,但研究设计和科学知识允许计算个体属于每个群体的概率。我们提出了一类广义加权最小二乘估计量,并表明现有的一致非参数方法属于该类。我们在该类中确定了一个计算简便的估计量,研究了其渐近性质,并表明与文献中现有的估计量相比,其效率有显著提高。对一项关于亨廷顿舞蹈症的大型遗传流行病学研究的应用揭示了有关亨廷顿舞蹈症发病年龄分布的信息,这为一些临床假设提供了线索。