Mollica Cristina, Tardella Luca
Dipartimento di Scienze Statistiche, Sapienza Università di Roma, Piazzale A. Moro 5, (00185) Roma, Italy.
Stat Med. 2014 Sep 20;33(21):3738-58. doi: 10.1002/sim.6224. Epub 2014 Jun 5.
We propose the use of probability models for ranked data as a useful alternative to a quantitative data analysis to investigate the outcome of bioassay experiments when the preliminary choice of an appropriate normalization method for the raw numerical responses is difficult or subject to criticism. We review standard distance-based and multistage ranking models and propose an original generalization of the Plackett-Luce model to account for the order of the ranking elicitation process. The usefulness of the novel model is illustrated with its maximum likelihood estimation for a real data set. Specifically, we address the heterogeneous nature of the experimental units via model-based clustering and detail the necessary steps for a successful likelihood maximization through a hybrid version of the expectation-maximization algorithm. The performance of the mixture model using the new distribution as mixture components is then compared with alternative mixture models for random rankings. A discussion on the interpretation of the identified clusters and a comparison with more standard quantitative approaches are finally provided.
当难以对原始数值响应选择合适的归一化方法或该方法存在争议时,我们建议使用排序数据的概率模型作为定量数据分析的一种有用替代方法,以研究生物测定实验的结果。我们回顾了基于标准距离和多阶段排序模型,并提出了Plackett-Luce模型的一种原始推广,以考虑排序诱导过程的顺序。通过对一个真实数据集的最大似然估计来说明新模型的实用性。具体而言,我们通过基于模型的聚类来处理实验单元的异质性,并详细介绍通过期望最大化算法的混合版本成功实现似然最大化的必要步骤。然后将使用新分布作为混合成分的混合模型的性能与用于随机排序的替代混合模型进行比较。最后讨论了所识别聚类的解释,并与更标准的定量方法进行了比较。