Xu Ke, Zhu Yan, Cao Weixing, Jiang Xiaoping, Jiang Zhijian, Li Shuailong, Ni Jun
College of Agriculture, Nanjing Agricultural University, Nanjing, China.
National Engineering and Technology Center for Information Agriculture, Nanjing, China.
Front Plant Sci. 2021 Nov 5;12:732968. doi: 10.3389/fpls.2021.732968. eCollection 2021.
Single-modal images carry limited information for features representation, and RGB images fail to detect grass weeds in wheat fields because of their similarity to wheat in shape. We propose a framework based on multi-modal information fusion for accurate detection of weeds in wheat fields in a natural environment, overcoming the limitation of single modality in weeds detection. Firstly, we recode the single-channel depth image into a new three-channel image like the structure of RGB image, which is suitable for feature extraction of convolutional neural network (CNN). Secondly, the multi-scale object detection is realized by fusing the feature maps output by different convolutional layers. The three-channel network structure is designed to take into account the independence of RGB and depth information, respectively, and the complementarity of multi-modal information, and the integrated learning is carried out by weight allocation at the decision level to realize the effective fusion of multi-modal information. The experimental results show that compared with the weed detection method based on RGB image, the accuracy of our method is significantly improved. Experiments with integrated learning shows that mean average precision () of 36.1% for grass weeds and 42.9% for broad-leaf weeds, and the overall detection precision, as indicated by intersection over ground truth (), is 89.3%, with weights of RGB and depth images at α = 0.4 and β = 0.3. The results suggest that our methods can accurately detect the dominant species of weeds in wheat fields, and that multi-modal fusion can effectively improve object detection performance.
单模态图像在特征表示方面携带的信息有限,并且由于形状与小麦相似,RGB图像无法检测麦田中的禾本科杂草。我们提出了一种基于多模态信息融合的框架,用于在自然环境中准确检测麦田中的杂草,克服了单模态在杂草检测方面的局限性。首先,我们将单通道深度图像重新编码为具有类似RGB图像结构的新三通道图像,这适用于卷积神经网络(CNN)的特征提取。其次,通过融合不同卷积层输出的特征图来实现多尺度目标检测。设计三通道网络结构分别考虑RGB和深度信息的独立性以及多模态信息的互补性,并在决策层面通过权重分配进行集成学习,以实现多模态信息的有效融合。实验结果表明,与基于RGB图像的杂草检测方法相比,我们的方法准确率显著提高。集成学习实验表明,禾本科杂草的平均精度均值(mAP)为36.1%,阔叶杂草为42.9%,以真实值交集(IoU)表示的整体检测精度为89.3%,RGB和深度图像的权重分别为α = 0.4和β = 0.3。结果表明,我们的方法可以准确检测麦田中的优势杂草种类,并且多模态融合可以有效提高目标检测性能。