基于自注意力机制的航空林火图像识别方法

王俊玲; 范习健; 杨绪兵; 业巧林; 符利勇

doi:10.12302/j.issn.1000-2006.202308021

基于自注意力机制的航空林火图像识别方法

王俊玲, 范习健, 杨绪兵, 业巧林, 符利勇

南京林业大学学报（自然科学版） ›› 2025, Vol. 49 ›› Issue (2) : 194-202.

PDF(4597 KB)

国家林草科技领军期刊
中国精品科技期刊
中国高校百佳科技期刊
江苏省新闻出版政府奖期刊奖
RCCSE林学权威期刊（A+）
CSCD核心期刊
Scopus数据库收录期刊
中文核心期刊
SCD核心期刊

作者加群：102861116

微信公众号：南京林业大学学报

高级检索

PDF(4597 KB)

南京林业大学学报（自然科学版） ›› 2025, Vol. 49 ›› Issue (2) : 194-202. DOI: 10.12302/j.issn.1000-2006.202308021

研究论文

基于自注意力机制的航空林火图像识别方法

王俊玲 ¹^,² ,
范习健 ¹^,^* ,
杨绪兵 ¹ ,
业巧林 ¹ ,
符利勇 ³

作者信息 +

Method for aerial forest fire image recognition based on self-attention mechanism

WANG Junling ¹^,² ,
FAN Xijian ¹^,^* ,
YANG Xubing ¹ ,
YE Qiaolin ¹ ,
FU Liyong ³

Author information +

文章历史 +

摘要

【目的】针对航空林火图像火点目标小、环境复杂等特点,提出一种基于自注意力机制的图像识别方法FireViT,以提高航空林火图像识别的准确率和鲁棒性。【方法】以张家口市崇礼区无人机采集的林火视频为数据源,通过数据预处理构建数据集。然后,选用10层Vision Transformer(ViT)为主干网络,采用交叠滑动窗格方式序列化图像,嵌入位置信息后作为第1层ViT的输入,将前9层ViT提取的区域选择模块通过多头自注意力和多层感知器机制批量嵌入第10层ViT,有效放大子图间的微差异以获取小目标特征。最后,采用对比特征学习策略,构建目标损失函数进行模型预测。为验证模型的有效性,设定训练集、测试集样本比例为8∶2、7∶3、6∶4和4∶6,并与5种经典模型进行识别性能对比。【结果】模型在4种训练集、测试集分配比例下,识别率均能达到100%,且准确率分别为94.82%、95.05%、94.90%和94.80%,平均准确率为94.89%,高于其他5种对比模型,本模型迅速收敛并达到较高准确率,且在后续迭代中准确率保持稳定,具有较强泛化能力;4种分配比例下识别率分别为99.97%、99.89%、99.80%、99.77%,高于其余5种模型。【结论】本研究模型采用自注意力机制与弱监督学习相结合方法,挖掘不同环境下航空林火图像局部特征差异,具有较好泛化能力和鲁棒性,可为提高火情、火险应对处置能力和效率以及预防森林火灾发生提供重要依据。

Abstract

【Objective】This study aims to address the challenges of small fire point targets and complex environments in aerial forest fire images, we propose FireViT, a self-attention-based image recognition method. 【Method】This method aims to enhance the accuracy and robustness of aerial forest fire image recognition. We used forest fire videos collected by drones in Chongli District, Zhangjiakou City, to construct a dataset through data preprocessing. A 10-layer vision transformer (ViT) was selected as the backbone network. Images were serialized using overlapping sliding windows, with embedded positional information fed into the first layer of ViT. The region selection modules, extracted from the preceding nine layers of ViT, were integrated into the tenth layer through multi-head self-attention and multi-layer perceptron mechanisms. This effectively amplified minor differences between subgraphs to capture features of small targets. Finally, a contrastive feature learning strategy was employed to construct an objective loss function for model prediction. We validated the model’s effectiveness by establishing training and testing sets with sample ratios of 8∶2, 7∶3, 6∶4, and 4∶6, and compared its performance with five classical models.【Result】With the allocation ratio of four training and test sets, the model achieved a recognition rate of 100% and accuracy of 94.82%, 95.05%, 94.90%, and 94.80%, respectively, with an average accuracy of 94.89%. This performance surpassed that of the five comparison models. The model converged rapidly, maintained a high recognition accuracy rate, and demonstrated stability in subsequent iterations. It showed strong generalization ability. The recognition rates were 99.97%, 99.89%, 99.80% and 99.77%, also higher than the five comparison models.【Conclusion】This research employed a model that integrated a self-attention mechanism with weakly supervised learning to reveal distinct local feature variations in aerial forest fire images across various environments. The approach exhibited strong generalization capability and robustness, which was significant for improving the capacity, efficiency, and effectiveness of fire situation management and hazard response. It also played a crucial role in preventing forest wildfires.

导出引用

王俊玲, 范习健, 杨绪兵, 等. 基于自注意力机制的航空林火图像识别方法[J]. 南京林业大学学报（自然科学版）. 2025, 49(2): 194-202 https://doi.org/10.12302/j.issn.1000-2006.202308021

WANG Junling, FAN Xijian, YANG Xubing, et al. Method for aerial forest fire image recognition based on self-attention mechanism[J]. JOURNAL OF NANJING FORESTRY UNIVERSITY. 2025, 49(2): 194-202 https://doi.org/10.12302/j.issn.1000-2006.202308021

中图分类号： TP391.41

参考文献

列表( 原文顺序 | 文献年度倒序 | 文中引用次数倒序 ) 可视化分析

[1]	AGGARWAL P, MISHRA N K, FATIMAH B, et al. COVID-19 image classification using deep learning:advances,challenges and opportunities[J]. Comput Biol Med, 2022,144:105350.DOI: 10.1016/j.compbiomed.2022.105350. 本文引用 [1]

[2]	PAYMODE A S, MALODE V B. Transfer learning for multi-crop leaf disease image classification using convolutional neural network VGG[J]. Artif Intell Agric, 2022, 6:23-33.DOI: 10.1016/j.aiia.2021.12.002. 本文引用 [1]

[3]	蔡茂, 刘芳. 基于细粒度图像分类算法的新冠CT图像分类[J]. 吉林大学学报(信息科学版), 2023, 41(4):676-684. CAI M, LIU F. CT image classification of COVID-19 based on fine-grained image classification algorithms[J]. J Jilin Univ (Inf Sci Ed), 2023, 41(4):676-684.DOI: 10.19292/j.cnki.jdxxp.2023.04.004. 本文引用 [1]

[4]	ZHANG S W, ZHANG C L. Modified U-Net for plant diseased leaf image segmentation[J]. Comput Electron Agric, 2023,204:107511.DOI: 10.1016/j.compag.2022.107511. 本文引用 [1]

[5]	张俊威. 基于深度学习的图像分割方法[J]. 数字技术与应用, 2023, 41(3):120-122,154. ZHANG J W. Image segmentation method based on deep learning[J]. Digit Technol Appl, 2023, 41(3):120-122,154.DOI: 10.19695/j.cnki.cn12-1369.2023.03.36. 本文引用 [1]

[6]	金燕, 薛智中, 姜智伟. 基于循环残差卷积神经网络的医学图像分割算法[J]. 计算机辅助设计与图形学学报, 2022, 34(8):1205-1215. JIN Y, XUE Z Z, JIANG Z W. Medical image segmentation based on recurrent residual convolution neural network[J]. J Comput Aided Des Comput Graph, 2022, 34(8):1205-1215.DOI: 10.3724/SP.J.1089.2022.19153. 本文引用 [1]

[7]	WEN L, CHENG Y, FANG Y, et al. A comprehensive survey of oriented object detection in remote sensing images[J]. Expert Syst Appl, 2023,224:119960.DOI: 10.1016/j.eswa.2023.119960. 本文引用 [1]

[8]	MAJID S, ALENEZI F, MASOOD S, et al. Attention based CNN model for fire detection and localization in real-world images[J]. Expert Syst Appl, 2022,189:116114.DOI: 10.1016/j.eswa.2021.116114. 本文引用 [1]

[9]	袁翔, 程塨, 李戈, 等. 遥感影像小目标检测研究进展[J]. 中国图象图形学报, 2023, 28(6):1662-1684. YUAN X, CHENG G, LI G, et al. Progress in small object detection for remote sensing images[J]. J Image Graph, 2023, 28(6):1662-1684.DOI: 10.11834/jig.221202. 本文引用 [1]

[10]

魏铭辰, 刘立波, 王晓丽. 2020—2021年宁夏野生鸟类细粒度分类研究图像数据集[J]. 中国科学数据(中英文网络版), 2022, 7(3):142-148.

WEI

M C

, LIU

L B

, WANG

X L

. 2020-2021 Image dataset for fine-grained classification of wild birds in Ningxia[J]. Chinese Scientific Data (Chinese-English Online Edition), 2022, 7(3):142-148. DOI: 10.11922/11-6035.nasdc.2021.0059.zh

本文引用 [1]

[11]	解丹, 陈立潮, 曹玲玲, 等. 基于卷积神经网络的车辆分类与检测技术研究[J]. 软件工程, 2023, 26(4):10-13. XIE D, CHEN L C, CAO L L, et al. Research on vehicle classification and detection technology based on convolutional neural network[J]. Softw Eng, 2023, 26(4):10-13.DOI: 10.19644/j.cnki.issn2096-1472.2023.004.003. 本文引用 [1]

[12]	LIN T Y, ROYCHOWDHURY A, MAJI S. Bilinear CNN models for fine-grained visual recognition[C]//2015 IEEE International Conference on Computer Vision (ICCV). December 7-13,2015, Santiago,Chile: IEEE, 2015:1449-1457.DOI: 10.1109/ICCV.2015.170. 本文引用 [1]

[13]	WOO S, PARK J, LEE J Y, et al. CBAM:convolutional block attention module[M]// Lecture Notes in Computer Science. Cham: Springer International Publishing,2018:3-19.DOI: 10.1007/978-3-030-01234-2_1. 本文引用 [1]

[14]	HAN K, WANG Y H, CHEN H T, et al. A survey on vision transformer[J]. IEEE Trans Pattern Anal Mach Intell, 2023, 45(1):87-110.DOI: 10.1109/TPAMI.2022.3152247. 本文引用 [2]

[15]	朱张莉, 饶元, 吴渊, 等. 注意力机制在深度学习中的研究进展[J]. 中文信息学报, 2019, 33(6):1-11. ZHU Z L, RAO Y, WU Y, et al. Research progress of attention mechanism in deep learning[J]. J Chin Inf Process, 2019, 33(6):1-11.DOI: 10.3969/j.issn.1003-0077.2019.06.001. 本文引用 [1]

[16]	WANG W Y, CUI Y C, LI G S, et al. A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition[J]. Neural Comput Appl, 2020, 32(18):14613-14622.DOI: 10.1007/s00521-020-05148-3. 本文引用 [1]

[17]	赵国川, 王姮, 张华, 等. 基于完全自注意力的水电枢纽缺陷识别方法[J]. 计算机工程, 2022, 48(9):277-285. ZHAO G C, WANG H, ZHANG H, et al. Hydropower complex defect recognition method based on pure self-attention[J]. Comput Eng, 2022, 48(9):277-285.DOI: 10.19678/j.issn.1000-3428.0062577. 本文引用 [1]

[18]	HORVÁTH J, BAIREDDY S, HAO H X, et al. Manipulation detection in satellite images using vision transformer[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).June 19-25,2021, Nashville,TN,USA: IEEE,2021:1032-1041.DOI: 10.1109/CVPRW53098.2021.00114. 本文引用 [2]

[19]	YE Q L, HUANG P, ZHANG Z, et al. Multiview learning with robust double-sided twin SVM[J]. IEEE Trans Cybern, 2022, 52(12):12745-12758.DOI: 10.1109/TCYB.2021.3088519. 本文引用 [1]

[20]	ABNAR S, ZUIDEMA W. Quantifying attention flow in transformers[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Online.Stroudsburg,PA, USA: Association for Computational Linguistics, 2020:4190-4197.DOI: 10.18653/v1/2020.acl-main.385. 本文引用 [1]