基于多尺度特征的双编码Transformer湿地区域分割

赵垣锟, 胡春华

南京林业大学学报(自然科学版) ›› 2026, Vol. 50 ›› Issue (3) : 229-238.

PDF(5834 KB)
PDF(5834 KB)
南京林业大学学报(自然科学版) ›› 2026, Vol. 50 ›› Issue (3) : 229-238. DOI: 10.12302/j.issn.1000-2006.202509033
研究论文

基于多尺度特征的双编码Transformer湿地区域分割

作者信息 +

Wetland area segmentation based on multi-scale features dual-coding Transformer

Author information +
文章历史 +

摘要

【目的】为了高效提取无人机湿地图像中的区域信息,有效划分湿地功能区域,快速精确地获取林地、湖泊等区域的数据,设计基于双编码结构和Transformer的无人机湿地遥感图像分割网络MfdFormer。【方法】在江苏洪泽湖湿地采集图像数据制作语义分割数据集。数据集包括水产养殖区、耕地、林地、湖泊和其他地块5个类别。MfdFormer网络采用主次编码结构,结合Transformer和微型解码通道对图像进行语义分割。主编码通道使用金字塔下采样模块,次编码通道由语义补全滑动窗口注意力模块与普通注意力模块组成。主次解码通道并行连接,减少下采样模块引起的特征损失,提高分割精度,并保持小的网络参数和推理延迟。在解码通道加入语义融合模块提高对相似类别的区分能力。【结果】使用洪泽湖图像数据进行训练和测试,MfdFormer以2.88的网络参数实现88.07%的平均交并比,网络推理时间为48.69 ms,其中林地分割的交并比为93.13%,平均交并比较Topformer高0.68个百分点,比HRNet高0.76个百分点。同时,在UAVid公共数据集上实现66.23%的平均交并比,比Topformer高1.81个百分点,验证了其先进性。【结论】MfdFormer语义分割网络能够实现洪泽湖湿地无人机图像快速准确的区域分割。

Abstract

【Objective】The precise delineation of wetland ecosystems through unmanned aerial vehicle(UAV)imagery analysis is essential for ecological resource management,particularly in environments with intricate land-cover patterns and dynamic seasonal variations. To overcome persistent limitations in existing semantic segmentation frameworks—specifically,feature degradation during hierarchical downsampling and insufficient discriminative power for semantically overlapping categories(e.g.,aquaculture ponds vs. natural water bodies)—this study introduces MfdFormer,a dual-encoder Transformer-based network optimized for UAV-borne wetland remote sensing. The architecture strategically reconciles computational efficiency with high-precision segmentation requirements,addressing critical operational constraints in real-time environmental monitoring scenarios.【Method】Methodological advancements center on a hierarchically structured dual-encoding paradigm. The primary encoder employs pyramidal spatial reduction modules with depthwise separable convolutions,systematically compressing input resolutions through four stages while preserving discriminative edge features. Complementing this,the secondary encoder deploys a novel information completes multiscale void attention(ICMVA)mechanism,which synergizes localized window-based self-attention with adaptive semantic gap-filling operations. This dual-path configuration enables concurrent capture of fine-grain textures and long-range contextual dependencies,particularly critical for distinguishing spectrally similar vegetation types. The decoding phase incorporates the parameter-decoupling micro-decoders that progressively merge multi-scale features through the channel-wise attention gating,followed by cross-level feature recalibration using 3×3 depthwise convolution. A semantic fusion module is incorporated into the decoding architecture to significantly improve the discriminative capability for morphologically analogous categories.【Result】The systematic evaluation protocol implemented on the Hongze Lake wetland dataset in Jiangsu Province—comprising 1 872 precisely annotated UAV-captured images categorized into five ecologically distinct land cover classes—provides conclusive evidence of MfdFormer's segmentation efficacy. Quantitative analysis reveals the architecture achieves an exact mean intersection-over-union(mIoU)score of 88.07% across all semantic categories,with particularly notable performance in woodland ecosystem delineation attaining 93.13% class-specific IoU. Comparative assessments against established benchmarks under standardized testing conditions demonstrate consistent superiority,surpassing Topformer's segmentation accuracy by 0.68 percentage points and HRNet's baseline performance by 0.76 percentage points in the comprehensive mIoU metrics. Cross-domain validation procedures executed on the UAVid urban remote sensing benchmark further substantiate the model's generalizability,yielding an mIoU of 1.81 percentage points higher than the Topformer's equivalent performance metric. Controlled ablation experiments quantitatively isolate the functional contribution of the interleaved contextual multi-view attention(ICMVA)mechanism through systematic component substitution. Replacement of ICMVA with standard windowed attention architectures results in measurable performance degradation,most acutely observed in texture-heterogeneous regions characterized by mixed vegetation canopies and fragmented hydrological formations,where IoU scores decrease by precisely 1.16 percentage points.【Conclusion】The rational omission of non-critical information in images,combined with randomized multi-positional local feature extraction through iterative minimal sampling,enables effective reconstruction of global contextual information,thereby enhancing the determinacy and accuracy of segmentation boundaries while reducing computational resource demands. For categories exhibiting substantial intra-class shape variance,strategically reducing detailed feature extraction mitigates the network overfitting risks. Multi-dimensional feature fusion demonstrates significant potential in recognizing complex wetland categories,as the integration of heterogeneous feature dimensions facilitates macro-scale object comprehension,ultimately improving the segmentation capability for UAV-acquired wetland imagery. The proposed MfdFormer architecture effectively balances segmentation precision and computational efficiency through its dual-branch feature extraction mechanism and multi-scale semantic integration strategy. Experimental results across heterogeneous datasets validate its robustness in handling complex wetland landscapes characterized by irregular boundaries and high intra-class variance,establishing practical value for large-scale wetland resource monitoring.

关键词

语义分割 / 洪泽湖湿地 / 区域分割 / 双重编码器 / 滑动窗口注意力

Key words

semantic segmentation / Hongze Lake wetland / area segmentation / dual coding / sliding window attention

引用本文

导出引用
赵垣锟, 胡春华. 基于多尺度特征的双编码Transformer湿地区域分割[J]. 南京林业大学学报(自然科学版). 2026, 50(3): 229-238 https://doi.org/10.12302/j.issn.1000-2006.202509033
ZHAO Yuankun, HU Chunhua. Wetland area segmentation based on multi-scale features dual-coding Transformer[J]. Journal of Nanjing Forestry University (Natural Sciences Edition). 2026, 50(3): 229-238 https://doi.org/10.12302/j.issn.1000-2006.202509033
中图分类号: X14;TP75;S717   

参考文献

[1]
崔丽娟, 雷茵茹, 张曼胤, 等. 小微湿地研究综述:定义、类型及生态系统服务[J]. 生态学报, 2021, 41(5):2077-2085.
CUI L J, LEI Y R, ZHANG M Y, et al. Review on small wetlands:Definition,typology and ecological services[J]. Acta Ecologica Sinica, 2021, 41(5):2077-2085. DOI: 10.5846/stxb202003260699.
[2]
杨楠, 王卫星, 赵祥模. 基于Retinex和改进的最小生成树分割提取模糊航空图像中的河流[J]. 山东农业大学学报(自然科学版), 2017, 48(6):890-896.
YANG N, WANG W X, ZHAO X M. Rivers on fuzzy aerial images extracted by retinex and improved mini spanning tree segmentation algorithm[J]. Journal of Shandong Agricultural University (Natural Science Edition), 2017, 48(6):890-896. DOI: 10.3969/jissn.1000-2324.2017.06.017.
[3]
赵庆展, 江萍, 王学文, 等. 基于无人机高光谱遥感影像的防护林树种分类[J]. 农业机械学报, 2021, 52(11):190-199.
ZHAO Q Z, JIANG P, WANG X W, et al. Classification of protection forest tree species based on UAV hyperspectral data[J]. Transactions of the Chinese Society for Agricultural Machinery, 2021, 52(11):190-199. DOI: 10.6041/j.issn.1000-1298.2021.11.020.
[4]
MARTINS J, JUNIOR J M, MENEZES G, et al. Image segmentation and classification with SLIC superpixel and convolutional neural network in forest context[C]// IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium.Yokohama, Japan: IEEE, 2019:6543-6546. DOI: 10.1109/igarss.2019.8898969.
[5]
TONG X Y, XIA G S, LU Q K, et al. Land-cover classification with high-resolution remote sensing images using transferable deep models[J]. Remote Sensing of Environment, 2020, 237:111322. DOI: 10.1016/j.rse.2019.111322.
[6]
RONNEBERGER O, FISCHER P, BROX T. Unet:Convolutional networks for biomedical image segmentation[C].Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015.Cham: Springer International Publishing,2015:234-241. DOI: 10.1007/978-3-319-24574-4_28.
[7]
YU C Q, GAO C X, WANG J B, et al. BiSeNet V2:Bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129(11):3051-3068. DOI: 10.1007/s11263-021-01515-2.
[8]
HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea: IEEE, 2019:1314-1324. DOI: 10.1109/iccv.2019.00140.
[9]
林洁如, 朱洪前, 杨国, 等. 基于改进DeepLabv3+的林木图像分割方法[J]. 林业工程学报, 2024, 9(3): 119-126.
LINJ J, ZHU H Q, YANG G, et al. Forest image segmentation method based on improved DeepLabv3+[J]. Journal of Forestry Engineering, 2024, 9(3): 119-126. DOI: 10.13360/j.issn.2096-1359.202309002.
[10]
SONG Y D, HE Z Q, QIAN H, et al. Vision transformers for single image dehazing[J]. IEEE Transactions on Image Processing, 2023, 32:1927-1941. DOI: 10.1109/TIP.2023.3256763.
[11]
KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6):84-90. DOI: 10.1145/3065386.
[12]
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas,NV, USA: IEEE, 2016:770-778. DOI: 10.1109/CVPR.2016.90.
[13]
何自芬, 史本杰, 张印辉, 等. 多注意力融合的环高原湖泊遥感影像分割[J]. 电子学报, 2023, 51(4):885-895.
HE Z F, SHI B J, ZHANG Y H, et al. Remote sensing image segmentation of around plateau lakes based on multi-attention fusion[J]. Acta Electronica Sinica, 2023, 51(4):885-895.
[14]
XIE E, WANG W, YU Z, et al. SegFormer:Simple and Efficient Design for Semantic Segmentation with Transformers[C]//35th Conference on Neural Information Processing Systems (NeurIPS 2021). Electric Network: NeurIPS, 2021: 12077-12090. DOI: 10.5555/3540261.3541185.
[15]
ZHANG W Q, HUANG Z L, LUO G Z, et al. TopFormer:Token pyramid transformer for mobile semantic segmentation[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans,LA,USA: IEEE,2022:12073-12083. DOI: 10.1109/CVPR52688.2022.01177.
[16]
朱小亮, 张海明, 侍猛, 等. 洪泽湖水质现状研究[J]. 当代化工研究, 2023(5):83-85.
ZHU X L, ZHANG H M, SHI M, et al. Study on the water quality of Hongze Lake[J]. Modern Chemical Research, 2023(5):83-85. DOI: 10.20087/j.cnki.1672-8114.2023.05.026.
[17]
ZHONG H F, SUN H M, HAN D N, et al. Lake water body extraction of optical remote sensing images based on semantic segmentation[J]. Applied Intelligence, 2022, 52(15):17974-17989. DOI: 10.1007/s10489-022-03345-2.
[18]
SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2:Inverted residuals and linear bottlenecks[C]// 2018 IEEE/CVF Conference on computer vision and pattern recognition.Salt Lake City,UT, USA: IEEE, 2018:4510-4520. DOI: 10.1109/CVPR.2018.00474.
[19]
JIAO J Y, TANG Y M, LIN K Y, et al. DilateFormer:Multi-scale dilated transformer for visual recognition[J]. IEEE Transactions on Multimedia, 2023, 25:8906-8919. DOI: 10.1109/TMM.2023.3243616.
[20]
CHEN W Y, JIANG Z Y, WANG Z Y, et al. Collaborative global-local networks for memory-efficient segmentation of ultra-high resolution images[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach,CA, USA: IEEE, 2019:8916-8925. DOI: 10.1109/CVPR.2019.00913.
[21]
WANG L B, LI R, ZHANG C, et al. UNetFormer:a UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 190:196-214. DOI: 10.1016/j.isprsjprs.2022.06.008.
[22]
LYU Y, VOSSELMAN G, XIA G S, et al. UAVid:a semantic segmentation dataset for UAV imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2020, 165:108-119. DOI: 10.1016/jisprsjprs.2020.05.009.

基金

国家自然科学基金面上项目(32572050)

责任编辑: 吴祝华
PDF(5834 KB)

Accesses

Citation

Detail

段落导航
相关文章

/