南京林业大学学报(自然科学版) ›› 2023, Vol. 47 ›› Issue (4): 23-30.doi: 10.12302/j.issn.1000-2006.202208043

所属专题: 第三届中国林草计算机应用大会论文精选(Ⅱ)

• 专题报道:第三届中国林草计算机应用大会论文精选(Ⅱ)(执行主编 李凤日) • 上一篇    下一篇

基于多视图集成的鸟鸣分类研究

刘江1(), 张雁2,*(), 吕丹桔3,*(), 鲁静3, 谢珊珊3, 子佳丽3, 陈旭3, 赵友杰3   

  1. 1.中国林业科学研究院林业科技信息研究所,北京 100091
    2.西南林业大学数理学院,云南 昆明 650224
    3.西南林业大学大数据与智能工程学院,云南 昆明 650224
  • 收稿日期:2022-08-20 修回日期:2023-02-17 出版日期:2023-07-30 发布日期:2023-07-20
  • 通讯作者: * 张雁(zhangyan@swfu.edu.cn),教授,负责论文写作与修改指导;吕丹桔(lvdanjv@swfu.edu.cn),教授,负责实验方案设计与论文修改指导。
  • 作者简介:刘江(jungleliu@swfu.edu.cn)。
  • 基金资助:
    云南省重大科技专项(202002AA10007);国家自然科学基金项目(61462078);国家自然科学基金项目(31860332);云南省教育厅科学研究基金项目(2021Y219);云南省教育厅科学研究基金项目(2022Y558)

Birdsong classification research based on multi-view ensembles

LIU Jiang1(), ZHANG Yan2,*(), LYU Danju3,*(), LU Jing3, XIE Shanshan3, ZI Jiali3, CHEN Xu3, ZHAO Youjie3   

  1. 1. Research Institute of Forestry Policy and Information, Chinese Academy of Forestry, Beijing 100091, China
    2. College of Mathematics and Physics, Southwest Forestry University, Kunming 650224, China
    3. College of Big Data and Intelligence Engineering, Southwest Forestry University, Kunming 650224, China
  • Received:2022-08-20 Revised:2023-02-17 Online:2023-07-30 Published:2023-07-20

摘要:

【目的】尝试融合多视图特征来最大化特征信息,提出多视图级联集成卷积神经网络(MVC-CNN)鸟鸣音分类方法,构建泛化性较强的鸟鸣分类模型,以促进鸟类物种多样性保护和生态环境智能监测的深入研究。【方法】以16种鸟鸣音频数据为研究对象,采用短时傅里叶变换(STFT)、小波变换(WT)和希尔伯-特黄变换(HHT)等特征提取方法生成鸟鸣音的3类谱图以构成多视图特征数据,并作为卷积神经网络(CNN)的输入,训练不同视图的基分类器STFT-CNN、WT-CNN和HHT-CNN;分别采用Bagging和Stacking集成方法构建了多视图Bagging集成卷积神经网络(MVB-CNN)模型和多视图Stacking集成卷积神经网络(MVS-CNN)模型。以CNN强大的特征提取能力,提出了多视图级联集成卷积神经网络(MVC-CNN)模型,将不同视图经CNN提取得到的深度特征级联融合,以支持向量机(SVM)为最终分类器获得分类结果。【结果】构建的基分类模型WT-CNN、STFT-CNN、HHT-CNN的准确率分别为89.11%、88.36%和81.00%;多视图集成模型MVB-CNN和MVS-CNN的准确率分别为89.92%和93.54%,多视图级联集成模型MVC-CNN的准确率为95.76%。MVC-CNN模型准确率比单一视图基分类模型提升6.65%~14.76%,比MVB-CNN和MVS-CNN提升5.84%和2.22%。【结论】研究提出的MVC-CNN模型能充分结合鸟鸣多视图特征的优势,有效提升鸟鸣分类效果,具有较高的稳定性和更好的泛化能力,为多视图鸟鸣音分类研究提供技术方案。

关键词: 特征提取, 多视图, 集成学习, 卷积神经网络

Abstract:

【Objective】This study aimed to build a birdsong classification model with strong generalization integrating multi-view features and maximizing feature information to promote profound research on bird species diversity protection and ecological environmentally-intelligent monitoring.【Method】Using 16 types of birdsong audio data as the research objects, the short-time Fourier transform (STFT), wavelet transform (WT) and Hilbert-Huang transform (HHT) feature extraction methods were used to generate three types of birdsong spectrograms to constitute multi-view feature data, and as the input of the convolutional neural network (CNN), the base classifiers STFT-CNN, WT-CNN, and HHT-CNN for different views were trained. The multi-view bagging ensemble convolutional neural network (MVB-CNN) and multi-view stacking ensemble convolutional neural network (MVS-CNN) models were constructed using bagging and stacking integration methods, respectively. With the powerful feature extraction capability of CNN, the multi-view cascaded ensemble convolutional neural network (MVC-CNN) model was proposed to cascade and fuse the deep features extracted from different views through CNN. The classification results were obtained by using a support vector machine (SVM). 【Result】The accuracy rates of the base classification models WT-CNN, STFT-CNN, and HHT-CNN constructed in this study were 89.11%, 88.36%, and 81.00%, respectively; the accuracy rates of the ensemble models MVB-CNN and MVS-CNN were 89.92% and 93.54%, respectively; and the accuracy of the multi-view cascade ensemble model MVC-CNN was 95.76%. The accuracy of the MVC-CNN model improved by 6.65%-14.76% over the single-view-based classification model and by 5.84% and 2.22% over the MVB-CNN and MVS-CNN models, respectively.【Conclusion】The MVC-CNN model proposed in this study fully combined the advantages of multi-view features of birdsong, effectively improving the birdsong classification effects with a greater stability and better generalizational ability, providing a technical solution for multi-view birdsong classification researches.

Key words: feature extraction, multi-view, ensemble learning, convolutional neural network

中图分类号: