Vazquez Blanca, Hevia-Montiel Nidiyare, Perez-Gonzalez Jorge, Haro Paulina
Unidad Académica del Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas del Estado de Yucatán, Universidad Nacional Autónoma de México, Mérida, Yucatán, Mexico.
Instituto de Investigaciones en Ciencias Veterinarias, Universidad Autónoma de Baja California, Mexicali, Baja California, Mexico.
PLoS One. 2025 Mar 24;20(3):e0315843. doi: 10.1371/journal.pone.0315843. eCollection 2025.
Chagas disease (CD), caused by the protozoan parasite Trypanosoma cruzi (T. cruzi), represents a major public health concern in most of the American continent and causes 12,000 deaths every year. CD clinically manifests in two phases (acute and chronic), and the diagnosis can result in complications due to the difference between phases and the long period between them. Still, strategies are lacking for the automatic diagnosis of healthy and T. cruzi-infected individuals with missing and limited data. In this work, we propose a Weighted Variational Auto-Encoder (W-VAE) for imputing and augmenting multimodal data to classify healthy individuals and individuals in the acute or chronic phases of T. cruzi infection from a murine model. W-VAE is a deep generative architecture trained with a new proposed loss function to which we added a weighting factor and a masking mechanism to improve the quality of the data generated. We imputed and augmented data using four modalities: electrocardiography signals, echocardiography images, Doppler spectrum, and ELISA antibody titers. We evaluated the generated data through different multi-classification tasks to identify healthy individuals and individuals in the acute or chronic phase of infection. In each multi-classification task, we assessed several classifiers, missing rates, and feature-selection methods. The best obtained accuracy was 92 ± 4% in training and 95% in the final test using a Gaussian Process Classifier with a missing rate of 50%. The accuracy achieved was 95% for individuals in healthy and acute phase and 100% for individuals in the chronic phase. Our approach can be useful in generating data to study the phases of T. cruzi infection.
恰加斯病(CD)由原生动物寄生虫克氏锥虫(T. cruzi)引起,是美洲大陆大部分地区主要的公共卫生问题,每年导致12000人死亡。CD在临床上表现为两个阶段(急性期和慢性期),由于两个阶段之间的差异以及它们之间的时间间隔较长,诊断可能会导致并发症。然而,对于数据缺失和有限的健康个体及感染克氏锥虫个体的自动诊断策略仍然缺乏。在这项工作中,我们提出了一种加权变分自编码器(W-VAE),用于插补和扩充多模态数据,以从鼠模型中对健康个体以及处于克氏锥虫感染急性期或慢性期的个体进行分类。W-VAE是一种深度生成架构,使用新提出的损失函数进行训练,我们在该损失函数中添加了一个加权因子和一个掩码机制,以提高生成数据的质量。我们使用四种模态插补和扩充数据:心电图信号、超声心动图图像、多普勒频谱和ELISA抗体滴度。我们通过不同的多分类任务评估生成的数据,以识别健康个体以及处于感染急性期或慢性期的个体。在每个多分类任务中,我们评估了几种分类器、缺失率和特征选择方法。使用缺失率为50%的高斯过程分类器,在训练中获得的最佳准确率为92±4%,在最终测试中为95%。对于健康和急性期个体,准确率达到95%,对于慢性期个体,准确率达到100%。我们的方法可用于生成数据以研究克氏锥虫感染的阶段。