Wu Ziqian, Park Jiyoon, Steiner Paul R, Zhu Bo, Zhang John X J
Thayer School of Engineering, Dartmouth College, Hanover, NH USA.
Dartmouth Hitchcock Medical Center, Lebanon, NH USA.
Res Sq. 2024 Mar 11:rs.3.rs-4061531. doi: 10.21203/rs.3.rs-4061531/v1.
Our study develops a generative adversarial network (GAN)-based method that generates faithful synthetic image data of human cardiomyocytes at varying stages in their maturation process, as a tool to significantly enhance the classification accuracy of cells and ultimately assist the throughput of computational analysis of cellular structure and functions.
Human induced pluripotent stem cell derived cardiomyocytes (hiPSC-CMs) were cultured on micropatterned collagen coated hydrogels of physiological stiffnesses to facilitate maturation and optical measurements were performed for their structural and functional analyses. Control groups were cultured on collagen coated glass well plates. These image recordings were used as the real data to train the GAN model.
The results show the GAN approach is able to replicate true features from the real data, and inclusion of such synthetic data significantly improves the classification accuracy compared to usage of only real experimental data that is often limited in scale and diversity.
The proposed model outperformed four conventional machine learning algorithms with respect to improved data generalization ability and data classification accuracy by incorporating synthetic data.
This work demonstrates the importance of integrating synthetic data in situations where there are limited sample sizes and thus, effectively addresses the challenges imposed by data availability.
我们的研究开发了一种基于生成对抗网络(GAN)的方法,该方法可生成人类心肌细胞在成熟过程中不同阶段的逼真合成图像数据,作为显著提高细胞分类准确性并最终辅助细胞结构和功能计算分析通量的工具。
将人诱导多能干细胞衍生的心肌细胞(hiPSC-CMs)培养在具有生理硬度的微图案化胶原包被水凝胶上以促进成熟,并对其进行结构和功能分析的光学测量。对照组在胶原包被的玻璃孔板上培养。这些图像记录用作训练GAN模型的真实数据。
结果表明,GAN方法能够从真实数据中复制真实特征,并且与仅使用规模和多样性通常有限的真实实验数据相比,包含此类合成数据可显著提高分类准确性。
通过合并合成数据,所提出的模型在提高数据泛化能力和数据分类准确性方面优于四种传统机器学习算法。
这项工作证明了在样本量有限的情况下整合合成数据的重要性,从而有效应对了数据可用性带来的挑战。