南京林业大学学报(自然科学版) ›› 2023, Vol. 47 ›› Issue (3): 1-10.doi: 10.12302/j.issn.1000-2006.202206048

所属专题: 第三届中国林草计算机应用大会论文精选

• 专题报道:第三届中国林草计算机应用大会论文精选(执行主编 李凤日) • 上一篇    下一篇

融合Swin Transformer的虫害图像实例分割优化方法研究

高家军(), 张旭, 郭颖(), 刘昱坤, 郭安琪, 石蒙蒙, 王鹏, 袁莹   

  1. 中国林业科学研究院资源信息研究所,北京 100091
  • 收稿日期:2022-06-25 修回日期:2022-12-27 出版日期:2023-05-30 发布日期:2023-05-25
  • 通讯作者: 郭颖
  • 基金资助:
    中央级公益性科研院所基本科研业务费专项资金项目(CAFYBB2021ZB002)

Research on the optimized pest image instance segmentation method based on the Swin Transformer model

GAO Jiajun(), ZHANG Xu, GUO Ying(), LIU Yukun, GUO Anqi, SHI Mengmeng, WANG Peng, YUAN Ying   

  1. Institute of Forest Resource Information Techniques, Chinese Academy of Forestry, Beijing 100091, China
  • Received:2022-06-25 Revised:2022-12-27 Online:2023-05-30 Published:2023-05-25
  • Contact: GUO Ying

摘要:

【目的】为了实现对虫害的精准监测,提出了一种融合Swin Transformer的图像实例分割优化方法,以期有效解决复杂真实场景下多幼虫个体图像识别分割困难的问题。【方法】选用Swin Transformer模型,改进Mask R-CNN实例分割模型的主干网部分,对黄野螟幼虫虫害图像进行识别分割。针对不同结构参数的Swin Transformer模型与ResNet模型,调整各层的输入输出维度,将其分别设置为Mask R-CNN的主干网进行对比实验,从定量与定性两个角度分析不同主干网的Mask R-CNN模型对黄野螟幼虫的识别分割精度与效果,确定最佳模型结构。【结果】①该方法在虫害识别框选方面的测度(F1)分数可达89.7%,平均精度(AP)可达88.0%;在虫害识别分割方面的F1分数可达84.3%,AP可达82.2%。相较于Mask R-CNN,在目标框选与目标分割方面分别提升8.75%与8.40%。②对于小目标虫害识别分割任务,该方法在虫害识别框选方面的F1分数可达88.4%,AP可达86.3%;在虫害识别分割方面的F1分数可达84.0%,AP可达81.7%。相较于Mask R-CNN,在目标框选与目标分割方面分别提升9.30%与9.45%。【结论】对于复杂真实场景下的图像实例分割任务,其识别分割效果极大地依赖于模型对图像特征的提取能力,而融合了Swin Transformer的Mask R-CNN实例分割模型,在主干网的特征提取能力更强,模型整体的识别分割效果更好,可为虫害的识别监测提供技术支撑,同时为保护农、林、牧等产业资源提供解决方案。

关键词: 虫害识别, Swin Transformer, Mask R-CNN, 实例分割, 土沉香, 黄野螟

Abstract:

【Objective】To achieve accurate pest monitoring, the author proposes an optimized instance segmentation method based on the Swin Transformer to effectively solve the difficulty in image recognition and segmentation of multi-larval individuals under complex real scenarios.【Method】The Swin Transformer model was selected to improve the backbone network of the Mask R-CNN instance segmentation model and to identify and segment Heortia vitessoides larvae which harmed Aquilaria sinensis. The input and output dimensions of all layers of the Swin Transformer and ResNet models with different structural parameters were adjusted. Both models were set as the backbone networks of Mask R-CNN for comparative experiments. H. vitessoides moore larvae identification and segmentation performances for different backbone networks were quantitatively and qualitatively analyzed using Mask R-CNN models to determine the best model structure.【Result】(1) Using this method, the F1 score and AP were 89.7% and 88.0%, respectively, in terms of pest identification framing, and 84.3% and 82.2%, respectively, in terms of pest identification and segmentation, increasing by 8.75% and 8.40%, respectively, compared to that of the Mask R-CNN model in terms of target framing and segmentation. (2) For small target pest identification and segmentation tasks, the F1 score and AP were 88.4% and 86.3%, respectively, in terms of pest identification framing, 84.0% and 81.7%, respectively, in terms of pest identification and segmentation, and increased by 9.30% and 9.45%, respectively, compared to that of the Mask R-CNN model in terms of target framing and segmentation.【Conclusion】In segmentation tasks under complex real scenarios, the recognition and segmentation effects depend to a large extent on the model’s ability to extract image features. By integrating the Swin Transformer, the mask R-CNN instance segmentation model has a stronger ability to extract features in the backbone network, with a better overall recognition and segmentation effect. It could provide technical support for the identification and monitoring of pests and solutions for the protection of agriculture, forestry, animal husbandry, and other industrial resources.

Key words: pest recognition, Swin Transformer, Mask R-CNN, instance segmentation, Aguilaria sinensis, Heortia vitessoides

中图分类号: