Yu Peng, Lian Yumin, Xie Elliot, Zuleger Cindy L, Albertini Richard J, Albertini Mark R, Newton Michael A
Department of Statistics, University of Wisconsin, Madison.
Department of Chemistry, Laboratory of Genetics, University of Wisconsin, Madison.
Ann Appl Stat. 2025 Sep;19(3):1884-1907. doi: 10.1214/25-aoas2032. Epub 2025 Aug 28.
Surrogate selection is an experimental design that without sequencing any DNA can restrict a sample of cells to those carrying certain genomic mutations. In immunological disease studies, this design may provide a relatively easy approach to enrich a lymphocyte sample with cells relevant to the disease response because the emergence of neutral mutations associates with the proliferation history of clonal subpopulations. A statistical analysis of clonotype sizes provides a structured, quantitative perspective on this useful property of surrogate selection. Our model specification couples within-clonotype birth-death processes with an exchangeable model across clonotypes. Beyond enrichment questions about the surrogate selection design, our framework enables a study of sampling properties of elementary sample diversity statistics; it also points to new statistics that may usefully measure the burden of somatic genomic alterations associated with clonal expansion. We examine statistical properties of immunological samples governed by the coupled model specification, and we illustrate calculations in surrogate selection studies of melanoma and in single-cell genomic studies of T cell repertoires.