Okazaki Akira, Kawano Shuichi
Graduate School of Informatics and Engineering, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu 182-8585, Tokyo, Japan.
Faculty of Mathematics, Kyushu University, 744 Motooka, Nishi-ku 819-0395, Fukuoka, Japan.
Entropy (Basel). 2022 Dec 17;24(12):1839. doi: 10.3390/e24121839.
Multi-task learning is a statistical methodology that aims to improve the generalization performances of estimation and prediction tasks by sharing common information among multiple tasks. On the other hand, compositional data consist of proportions as components summing to one. Because components of compositional data depend on each other, existing methods for multi-task learning cannot be directly applied to them. In the framework of multi-task learning, a network lasso regularization enables us to consider each sample as a single task and construct different models for each one. In this paper, we propose a multi-task learning method for compositional data using a sparse network lasso. We focus on a symmetric form of the log-contrast model, which is a regression model with compositional covariates. Our proposed method enables us to extract latent clusters and relevant variables for compositional data by considering relationships among samples. The effectiveness of the proposed method is evaluated through simulation studies and application to gut microbiome data. Both results show that the prediction accuracy of our proposed method is better than existing methods when information about relationships among samples is appropriately obtained.
多任务学习是一种统计方法,旨在通过在多个任务之间共享公共信息来提高估计和预测任务的泛化性能。另一方面,成分数据由总和为1的比例组成。由于成分数据的各成分相互依赖,现有的多任务学习方法不能直接应用于它们。在多任务学习框架中,网络套索正则化使我们能够将每个样本视为一个单独的任务,并为每个样本构建不同的模型。在本文中,我们提出了一种使用稀疏网络套索的成分数据多任务学习方法。我们关注对数对比模型的对称形式,它是一种具有成分协变量的回归模型。我们提出的方法使我们能够通过考虑样本之间的关系来提取成分数据的潜在聚类和相关变量。通过模拟研究和对肠道微生物组数据的应用来评估所提出方法的有效性。两个结果都表明,当适当地获取样本之间关系的信息时,我们提出的方法的预测准确性优于现有方法。