JOURNAL OF NANJING FORESTRY UNIVERSITY ›› 2023, Vol. 47 ›› Issue (4): 23-30.doi: 10.12302/j.issn.1000-2006.202208043

Special Issue: 第三届中国林草计算机应用大会论文精选(Ⅱ)

Previous Articles     Next Articles

Birdsong classification research based on multi-view ensembles

LIU Jiang1(), ZHANG Yan2,*(), LYU Danju3,*(), LU Jing3, XIE Shanshan3, ZI Jiali3, CHEN Xu3, ZHAO Youjie3   

  1. 1. Research Institute of Forestry Policy and Information, Chinese Academy of Forestry, Beijing 100091, China
    2. College of Mathematics and Physics, Southwest Forestry University, Kunming 650224, China
    3. College of Big Data and Intelligence Engineering, Southwest Forestry University, Kunming 650224, China
  • Received:2022-08-20 Revised:2023-02-17 Online:2023-07-30 Published:2023-07-20

Abstract:

【Objective】This study aimed to build a birdsong classification model with strong generalization integrating multi-view features and maximizing feature information to promote profound research on bird species diversity protection and ecological environmentally-intelligent monitoring.【Method】Using 16 types of birdsong audio data as the research objects, the short-time Fourier transform (STFT), wavelet transform (WT) and Hilbert-Huang transform (HHT) feature extraction methods were used to generate three types of birdsong spectrograms to constitute multi-view feature data, and as the input of the convolutional neural network (CNN), the base classifiers STFT-CNN, WT-CNN, and HHT-CNN for different views were trained. The multi-view bagging ensemble convolutional neural network (MVB-CNN) and multi-view stacking ensemble convolutional neural network (MVS-CNN) models were constructed using bagging and stacking integration methods, respectively. With the powerful feature extraction capability of CNN, the multi-view cascaded ensemble convolutional neural network (MVC-CNN) model was proposed to cascade and fuse the deep features extracted from different views through CNN. The classification results were obtained by using a support vector machine (SVM). 【Result】The accuracy rates of the base classification models WT-CNN, STFT-CNN, and HHT-CNN constructed in this study were 89.11%, 88.36%, and 81.00%, respectively; the accuracy rates of the ensemble models MVB-CNN and MVS-CNN were 89.92% and 93.54%, respectively; and the accuracy of the multi-view cascade ensemble model MVC-CNN was 95.76%. The accuracy of the MVC-CNN model improved by 6.65%-14.76% over the single-view-based classification model and by 5.84% and 2.22% over the MVB-CNN and MVS-CNN models, respectively.【Conclusion】The MVC-CNN model proposed in this study fully combined the advantages of multi-view features of birdsong, effectively improving the birdsong classification effects with a greater stability and better generalizational ability, providing a technical solution for multi-view birdsong classification researches.

Key words: feature extraction, multi-view, ensemble learning, convolutional neural network

CLC Number: