Birdsong classification research based on multi-view ensembles

LIU Jiang, ZHANG Yan, LYU Danju, LU Jing, XIE Shanshan, ZI Jiali, CHEN Xu, ZHAO Youjie

JOURNAL OF NANJING FORESTRY UNIVERSITY ›› 2023, Vol. 47 ›› Issue (4) : 23-30.

PDF(1924 KB)
PDF(1924 KB)
JOURNAL OF NANJING FORESTRY UNIVERSITY ›› 2023, Vol. 47 ›› Issue (4) : 23-30. DOI: 10.12302/j.issn.1000-2006.202208043

Birdsong classification research based on multi-view ensembles

Author information +
History +

Abstract

【Objective】This study aimed to build a birdsong classification model with strong generalization integrating multi-view features and maximizing feature information to promote profound research on bird species diversity protection and ecological environmentally-intelligent monitoring.【Method】Using 16 types of birdsong audio data as the research objects, the short-time Fourier transform (STFT), wavelet transform (WT) and Hilbert-Huang transform (HHT) feature extraction methods were used to generate three types of birdsong spectrograms to constitute multi-view feature data, and as the input of the convolutional neural network (CNN), the base classifiers STFT-CNN, WT-CNN, and HHT-CNN for different views were trained. The multi-view bagging ensemble convolutional neural network (MVB-CNN) and multi-view stacking ensemble convolutional neural network (MVS-CNN) models were constructed using bagging and stacking integration methods, respectively. With the powerful feature extraction capability of CNN, the multi-view cascaded ensemble convolutional neural network (MVC-CNN) model was proposed to cascade and fuse the deep features extracted from different views through CNN. The classification results were obtained by using a support vector machine (SVM). 【Result】The accuracy rates of the base classification models WT-CNN, STFT-CNN, and HHT-CNN constructed in this study were 89.11%, 88.36%, and 81.00%, respectively; the accuracy rates of the ensemble models MVB-CNN and MVS-CNN were 89.92% and 93.54%, respectively; and the accuracy of the multi-view cascade ensemble model MVC-CNN was 95.76%. The accuracy of the MVC-CNN model improved by 6.65%-14.76% over the single-view-based classification model and by 5.84% and 2.22% over the MVB-CNN and MVS-CNN models, respectively.【Conclusion】The MVC-CNN model proposed in this study fully combined the advantages of multi-view features of birdsong, effectively improving the birdsong classification effects with a greater stability and better generalizational ability, providing a technical solution for multi-view birdsong classification researches.

Key words

feature extraction / multi-view / ensemble learning / convolutional neural network

Cite this article

Download Citations
LIU Jiang , ZHANG Yan , LYU Danju , et al . Birdsong classification research based on multi-view ensembles[J]. JOURNAL OF NANJING FORESTRY UNIVERSITY. 2023, 47(4): 23-30 https://doi.org/10.12302/j.issn.1000-2006.202208043

References

[1]
胡耀文. 音频信号特征提取及其分类研究[D]. 昆明: 昆明理工大学, 2018.
HU Y W. Research on feature extraction and classification of audio signal[D]. Kunming: Kunming University of Science and Technology, 2018.
[2]
XIA C W, HUANG R, WEI C C, et al. Individual identification on the basis of the songs of the Asian Stubtail (Urosphena squameiceps)[J]. Chin Birds, 2011, 2(3): 132-139. DOI: 10.5122/cbirds.2011.0024.
[3]
马克平. 多样性监测依赖于地面人工观测与先进技术手段的有机结合[J]. 生物多样性, 2016, (11): 1201-1202.
MA K P. Biodiversity monitoring relies on the integration of human observation and automatic collection of data with advanced equipment and facilities[J]. Biodiver Sci, 2016, 24(11): 1201-1202. DOI: 10.17520/biods.2016343.
[4]
EMLEN J T, DEJONG M J. Counting birds: the problem of variable hearing abilities[J]. Journal of Field Ornithology, 1992, 63(1): 26-31.
[5]
MURUGAIYA R, ABAS P E, DE SILVA L C. Probability enhanced entropy (PEE) novel feature for improved bird sound classification[J]. Mach Intell Res, 2022, 19(1): 52-62. DOI: 10.1007/s11633-022-1318-3.
[6]
XIE S, ZHANG Y, LV D, et al. Birdsongs recognition based on ensemble ELM with multi-strategy differential evolution[J]. Sci Rep, 2022, 12(1): 9739. DOI: 10.1038/s41598-022-13957-w.
[7]
陈瀚翔, 邱志斌, 王海祥, 等. 基于MFCC特征与GMM的输电线路渉鸟故障相关鸟种智能识别[J]. 水电能源科学, 2021, 39(7): 171-174, 67.
CHEN H X, QIU Z B, WANG H X, et al. Intelligent identification of bird species related to transmission line faults based on MFCC features and GMM[J]. Water Resour Power, 2021, 39(7): 171-174, 67. DOI: 10.26914/c.cnkihy.2022.001440.
[8]
ANWAR M Z, KALEEM Z, JAMALIPOUR A. Machine learning inspired sound-based amateur drone detection for public safety applications[J]. IEEE Trans Veh Technol, 2019, 68(3): 2526-2534. DOI: 10.1109/TVT.2019.2893615.
[9]
PICZAK K J. Recognizing bird species in audio recordings using deep convolutional neural networks[C]// CLEF (working notes). Evora, Portugal, 2016: 534-543.
[10]
BAI J, CHEN C, CHEN J. Xception based method for bird sound recognition of BirdCLEF 2020 [C]// CLEF (working notes). Thessaloniki, Greece, 2020.
[11]
LIU Z, CHEN W, CHEN A, et al. Birdsong classification based on multi feature channel fusion[J]. Multimed Tools Appl, 2022, 81(11): 15469-15490. DOI: 10.1007/s11042-022-12570-3.
[12]
LIU J, ZHANG Y, LV D, et al. Birdsong classification based on ensemble multi-scale convolutional neural network[J]. Sci Rep, 2022, 12: 8636. DOI: 10.1038/s41598-022-12121-8.
[13]
XIE J, ZHU M. Handcrafted features and late fusion with deep learning for bird sound classification[J]. Ecol Inform, 2019, 52: 74-81. DOI: 10.1016/j.ecoinf.2019.05.007.
[14]
YAN N, CHEN A, ZHOU G, et al. Birdsong classification based on multi-feature fusion[J]. Multimed Tools Appl, 2021, 80(30): 36529-36547. DOI: 10.1007/s11042-021-11396-9.
[15]
SUN S. A survey of multi-view machine learning[J]. Neural Comput & Applic, 2013, 23(7): 2031-2038. DOI: 10.1007/s00521-013-1362-6.
[16]
ZHAO J, XIE X, XU X, et al. Multi-view learning overview:recent progress and new challenges[J]. Inf Fusion, 2017, 38: 43-54. DOI: 10.1016/j.inffus.2017.02.007.
[17]
HOTELLING H. Relations between two sets of variates[M]// Springer Series in Statistics. New York: Springer, 1992: 162-190. DOI: 10.1007/978-1-4612-4380-9_14.
[18]
BLUM A, MITCHELL T. Combining labeled and unlabeled data with co-training[C]// Proceedings of the Eleventh Annual Conference on Computational Learning Theory-COLT'98, New York, USA: ACM Press, 1998: 92-100. DOI: 10.1145/279943.279962.
[19]
DASARATHY B V, SHEELA B V. A composite classifier system design: concepts and methodology[J]. Proc IEEE, 1979, 67(5): 708-713. DOI: 10.1109/PROC.1979.11321.
[20]
YANG Z, WANG H, HAN Y, et al. Discriminative multi-task multi-view feature selection and fusion for multimedia analysis[J]. Multimed Tools Appl, 2018, 77(3): 3431-3453. DOI: 10.1007/s11042-017-5165-0.
[21]
XIE M, YE Z, PAN G, et al. Incomplete multi-view subspace clustering with adaptive instance-sample mapping and deep feature fusion[J]. Appl Intell, 2021, 51(8): 5584-5597. DOI: 10.1007/s10489-020-02138-9.
[22]
TAO W, LEU M, YIN Z. American sign language alphabet recognition using Convolutional Neural Networks with multiview augmentation and inference fusion[J]. Eng Appl Artif Intell, 2018, 76: 202-213. DOI: 10.1016/j.engappai.2018.09.006.
[23]
谢将剑, 李文彬, 张军国, 等. 基于Chirplet语图特征和深度学习的鸟类物种识别方法[J]. 北京林业大学学报, 2018, 40(3): 122-127.
XIE J J, LI W B, ZHANG J G, et al. Bird species recognition method based on Chirplet spectrogram feature and deep learning[J]. J Beijing For Univ, 2018, 40(3): 122-127. DOI: 10.13332/j.1000-1522.20180008.
[24]
GRIFFIN D, LIM J. Signal estimation from modified short-time Fourier transform[J]. IEEE Trans Acoust Speech Signal Process, 1984, 32(2): 236-243. DOI: 10.1109/TASSP.1984.1164317.
[25]
NAGARAJAIAH S, BASU B. Output only modal identification and structural damage detection using time frequency & wavelet techniques[J]. Earthq Eng Eng Vib, 2009, 8(4): 583-605. DOI: 10.1007/s11803-009-9120-6.
[26]
HUANG N, WU Z. A review on Hilbert-Huang transform: Method and its applications to geophysical studies[J]. Rev Geophys, 2008, 46(2): RG2006. DOI: 10.1029/2007rg000228.
[27]
HUANG N, WU M, QU W, et al. Applications of Hilbert-Huang transform to non-stationary financial time series analysis[J]. Appl Stochastic Models Bus Ind, 2003, 19(3): 245-268. DOI: 10.1002/asmb.501.
[28]
WU Z, HUANG N. Ensemble empirical mode decomposition: a noise-assisted data analysis method[J]. Adv Adapt Data Anal, 2009, 1(1): 1-41. DOI: 10.1142/s1793536909000047.
[29]
LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proc IEEE, 1998, 86(11): 2278-2324. DOI: 10.1109/5.726791.
[30]
XU R, ZHOU S, LI W. MEMS accelerometer based nonspecific-user hand gesture recognition[J]. IEEE Sens J, 2012, 12(5): 1166-1173. DOI: 10.1109/JSEN.2011.2166953.
[31]
张华美, 张皎洁. 基于人工智能的脱机手写数字识别研究综述[J]. 南京邮电大学学报(自然科学版), 2021, 41(5): 83-91.
ZHANG H M, ZHANG J J. Summary of offline handwritten digit recognition research based on artificial intelligence[J]. J Nanjing Univ Posts Telecommun (Nat Sci Ed), 2021, 41(5): 83-91. DOI: 10.14132/j.cnki.1673-5439.2021.05.012.
[32]
BREIMAN L. Bagging predictors[J]. Mach Learn, 1996, 24(2): 123-140. DOI: 10.1023/A:1018054314350.
[33]
WOLPERT D H. Stacked generalization[J]. Neural Netw, 1992, 5(2): 241-259. DOI: 10.1016/S0893-6080(05)80023-1.
[34]
LIU J, ZHANG Y, LV D, et al. Classification method of birdsong based on Gabor_WT feature image and convolutional neural network[C]// 4th International Conference on Pattern Recognition and Artificial Intelligence (PRAI). IEEE, 2021: 134-140. DOI: 10.1109/PRAI53619.2021.9551079.
PDF(1924 KB)

Accesses

Citation

Detail

Sections
Recommended
The full text is translated into English by AI, aiming to facilitate reading and comprehension. The core content is subject to the explanation in Chinese.

/