Suppr超能文献

针对具有未观测到的子群体标识符的数据进行高效分布估计。

Efficient distribution estimation for data with unobserved sub-population identifiers.

作者信息

Ma Yanyuan, Wang Yuanjia

机构信息

Department of Statistics, Texas A&M University, College Station, TX 77845.

出版信息

Electron J Stat. 2012;6:710-737. doi: 10.1214/12-EJS690.

Abstract

We study efficient nonparametric estimation of distribution functions of several scientifically meaningful sub-populations from data consisting of mixed samples where the sub-population identifiers are missing. Only probabilities of each observation belonging to a sub-population are available. The problem arises from several biomedical studies such as quantitative trait locus (QTL) analysis and genetic studies with ungenotyped relatives where the scientific interest lies in estimating the cumulative distribution function of a trait given a specific genotype. However, in these studies subjects' genotypes may not be directly observed. The distribution of the trait outcome is therefore a mixture of several genotype-specific distributions. We characterize the complete class of consistent estimators which includes members such as one type of nonparametric maximum likelihood estimator (NPMLE) and least squares or weighted least squares estimators. We identify the efficient estimator in the class that reaches the semiparametric efficiency bound, and we implement it using a simple procedure that remains consistent even if several components of the estimator are mis-specified. In addition, our close inspections on two commonly used NPMLEs in these problems show the surprising results that the NPMLE in one form is highly inefficient, while in the other form is inconsistent. We provide simulation procedures to illustrate the theoretical results and demonstrate the proposed methods through two real data examples.

摘要

我们研究了从子群体标识符缺失的混合样本数据中,对几个具有科学意义的子群体的分布函数进行有效非参数估计的问题。这里仅可获得每个观测值属于某个子群体的概率。该问题源于多项生物医学研究,如数量性状基因座(QTL)分析以及对未进行基因分型的亲属的遗传学研究,其中科学兴趣在于估计给定特定基因型时某一性状的累积分布函数。然而,在这些研究中,受试者的基因型可能无法直接观测到。因此,性状结果的分布是几种基因型特异性分布的混合。我们刻画了一致估计量的完全类,其中包括诸如一种非参数最大似然估计量(NPMLE)以及最小二乘或加权最小二乘估计量等成员。我们在该类中识别出达到半参数效率界的有效估计量,并通过一个简单的程序来实现它,即使估计量的几个分量被错误设定,该程序仍保持一致性。此外,我们对这些问题中两个常用的NPMLE进行仔细研究后发现了令人惊讶的结果:一种形式的NPMLE效率极低,而另一种形式的NPMLE则不一致。我们提供了模拟程序来说明理论结果,并通过两个实际数据示例展示所提出的方法。

相似文献

3
Efficiency of the Breslow estimator in semiparametric transformation models.半参数变换模型中 Breslow 估计量的效率。
Lifetime Data Anal. 2024 Apr;30(2):291-309. doi: 10.1007/s10985-023-09611-w. Epub 2023 Nov 26.
9
Weighted NPMLE for the Subdistribution of a Competing Risk.竞争风险子分布的加权非参数最大似然估计
J Am Stat Assoc. 2019;114(525):259-270. doi: 10.1080/01621459.2017.1401540. Epub 2018 Jul 9.

引用本文的文献

本文引用的文献

7
Re: Population-based, case-control study of HER2 genetic polymorphism and breast cancer risk.
J Natl Cancer Inst. 2003 Aug 20;95(16):1251-2. doi: 10.1093/jnci/djg032.
8
Accuracy of family history data on Parkinson's disease.帕金森病家族史数据的准确性。
Neurology. 2003 Jul 8;61(1):18-23. doi: 10.1212/01.wnl.0000074784.35961.c0.
10
A marginal likelihood approach for estimating penetrance from kin-cohort designs.
Biometrics. 2001 Mar;57(1):245-52. doi: 10.1111/j.0006-341x.2001.00245.x.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验