Hong Qingqing, Liu Wei, Zhu Yue, Ren Tianyu, Shi Changrong, Lu Zhixin, Yang Yunqin, Deng Ruiting, Qian Jing, Tan Changwei
Jiangsu Key Laboratory of Crop Genetics and Physiology/Jiangsu Key Laboratory of Crop Cultivation and Physiology, Agricultural College of Yangzhou University, Yangzhou, China.
Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops/Joint International Research Laboratory of Agriculture and Agri-Product Safety of the Ministry of Education of China/Jiangsu Province Engineering Research Center of Knowledge Management and Intelligent Service, College of Information Engineer, Yangzhou University, Yangzhou, China.
Front Plant Sci. 2024 Jul 2;15:1425131. doi: 10.3389/fpls.2024.1425131. eCollection 2024.
Accurate wheat ear counting is one of the key indicators for wheat phenotyping. Convolutional neural network (CNN) algorithms for counting wheat have evolved into sophisticated tools, however because of the limitations of sensory fields, CNN is unable to simulate global context information, which has an impact on counting performance. In this study, we present a hybrid attention network (CTHNet) for wheat ear counting from RGB images that combines local features and global context information. On the one hand, to extract multi-scale local features, a convolutional neural network is built using the Cross Stage Partial framework. On the other hand, to acquire better global context information, tokenized image patches from convolutional neural network feature maps are encoded as input sequences using Pyramid Pooling Transformer. Then, the feature fusion module merges the local features with the global context information to significantly enhance the feature representation. The Global Wheat Head Detection Dataset and Wheat Ear Detection Dataset are used to assess the proposed model. There were 3.40 and 5.21 average absolute errors, respectively. The performance of the proposed model was significantly better than previous studies.
准确的麦穗计数是小麦表型分析的关键指标之一。用于麦穗计数的卷积神经网络(CNN)算法已经发展成为复杂的工具,然而,由于感受野的限制,CNN无法模拟全局上下文信息,这对计数性能产生影响。在本研究中,我们提出了一种用于从RGB图像中进行麦穗计数的混合注意力网络(CTHNet),它结合了局部特征和全局上下文信息。一方面,为了提取多尺度局部特征,使用跨阶段局部框架构建卷积神经网络。另一方面,为了获取更好的全局上下文信息,将来自卷积神经网络特征图的图像块进行令牌化处理,并使用金字塔池化变换器将其编码为输入序列。然后,特征融合模块将局部特征与全局上下文信息合并,以显著增强特征表示。使用全球小麦穗检测数据集和麦穗检测数据集对所提出的模型进行评估。平均绝对误差分别为3.40和5.21。所提出模型的性能明显优于先前的研究。