Birdsong classification research based on multi-view ensembles

LIU Jiang; ZHANG Yan; LYU Danju; LU Jing; XIE Shanshan; ZI Jiali; CHEN Xu; ZHAO Youjie

doi:10.12302/j.issn.1000-2006.202208043

LIU Jiang, ZHANG Yan, LYU Danju, LU Jing, XIE Shanshan, ZI Jiali, CHEN Xu, ZHAO Youjie

JOURNAL OF NANJING FORESTRY UNIVERSITY ›› 2023, Vol. 47 ›› Issue (4) : 23-30.

PDF(1924 KB)

JOURNAL OF NANJING FORESTRY UNIVERSITY ›› 2023, Vol. 47 ›› Issue (4) : 23-30. DOI: 10.12302/j.issn.1000-2006.202208043

Birdsong classification research based on multi-view ensembles

LIU Jiang ¹ ,
ZHANG Yan ²^,^* ,
LYU Danju ³^,^* ,
LU Jing ³ ,
XIE Shanshan ³ ,
ZI Jiali ³ ,
CHEN Xu ³ ,
ZHAO Youjie ³

Author information +

History +

Abstract

【Objective】This study aimed to build a birdsong classification model with strong generalization integrating multi-view features and maximizing feature information to promote profound research on bird species diversity protection and ecological environmentally-intelligent monitoring.【Method】Using 16 types of birdsong audio data as the research objects, the short-time Fourier transform (STFT), wavelet transform (WT) and Hilbert-Huang transform (HHT) feature extraction methods were used to generate three types of birdsong spectrograms to constitute multi-view feature data, and as the input of the convolutional neural network (CNN), the base classifiers STFT-CNN, WT-CNN, and HHT-CNN for different views were trained. The multi-view bagging ensemble convolutional neural network (MVB-CNN) and multi-view stacking ensemble convolutional neural network (MVS-CNN) models were constructed using bagging and stacking integration methods, respectively. With the powerful feature extraction capability of CNN, the multi-view cascaded ensemble convolutional neural network (MVC-CNN) model was proposed to cascade and fuse the deep features extracted from different views through CNN. The classification results were obtained by using a support vector machine (SVM). 【Result】The accuracy rates of the base classification models WT-CNN, STFT-CNN, and HHT-CNN constructed in this study were 89.11%, 88.36%, and 81.00%, respectively; the accuracy rates of the ensemble models MVB-CNN and MVS-CNN were 89.92% and 93.54%, respectively; and the accuracy of the multi-view cascade ensemble model MVC-CNN was 95.76%. The accuracy of the MVC-CNN model improved by 6.65%-14.76% over the single-view-based classification model and by 5.84% and 2.22% over the MVB-CNN and MVS-CNN models, respectively.【Conclusion】The MVC-CNN model proposed in this study fully combined the advantages of multi-view features of birdsong, effectively improving the birdsong classification effects with a greater stability and better generalizational ability, providing a technical solution for multi-view birdsong classification researches.

Key words

feature extraction / multi-view / ensemble learning / convolutional neural network

Cite this article

EndNote

Ris (Procite)

Bibtex

Download Citations

LIU Jiang , ZHANG Yan , LYU Danju , et al . Birdsong classification research based on multi-view ensembles[J]. JOURNAL OF NANJING FORESTRY UNIVERSITY. 2023, 47(4): 23-30 https://doi.org/10.12302/j.issn.1000-2006.202208043

References

List( Publishing order | Descend order by publishing year | Descend order by cited within ) Chart analysis

[1]	胡耀文. 音频信号特征提取及其分类研究[D]. 昆明: 昆明理工大学, 2018. HU Y W. Research on feature extraction and classification of audio signal[D]. Kunming: Kunming University of Science and Technology, 2018. Cited in this article [1]

[2]	XIA C W, HUANG R, WEI C C, et al. Individual identification on the basis of the songs of the Asian Stubtail (Urosphena squameiceps)[J]. Chin Birds, 2011, 2(3): 132-139. DOI: 10.5122/cbirds.2011.0024. Cited in this article [1]

[3]

马克平. 多样性监测依赖于地面人工观测与先进技术手段的有机结合[J]. 生物多样性, 2016, (11): 1201-1202.

K P

. Biodiversity monitoring relies on the integration of human observation and automatic collection of data with advanced equipment and facilities[J]. Biodiver Sci, 2016, 24(11): 1201-1202. DOI: 10.17520/biods.2016343.

Cited in this article [1]

[4]	EMLEN J T, DEJONG M J. Counting birds: the problem of variable hearing abilities[J]. Journal of Field Ornithology, 1992, 63(1): 26-31. Cited in this article [1]

[5]	MURUGAIYA R, ABAS P E, DE SILVA L C. Probability enhanced entropy (PEE) novel feature for improved bird sound classification[J]. Mach Intell Res, 2022, 19(1): 52-62. DOI: 10.1007/s11633-022-1318-3. Cited in this article [2]

[6]	XIE S, ZHANG Y, LV D, et al. Birdsongs recognition based on ensemble ELM with multi-strategy differential evolution[J]. Sci Rep, 2022, 12(1): 9739. DOI: 10.1038/s41598-022-13957-w. Cited in this article [3]

[7]

陈瀚翔, 邱志斌, 王海祥, 等. 基于MFCC特征与GMM的输电线路渉鸟故障相关鸟种智能识别[J]. 水电能源科学, 2021, 39(7): 171-174, 67.

CHEN

H X

, QIU

Z B

, WANG

H X

, et al. Intelligent identification of bird species related to transmission line faults based on MFCC features and GMM[J]. Water Resour Power, 2021, 39(7): 171-174, 67. DOI: 10.26914/c.cnkihy.2022.001440.

Cited in this article [2]

[8]	ANWAR M Z, KALEEM Z, JAMALIPOUR A. Machine learning inspired sound-based amateur drone detection for public safety applications[J]. IEEE Trans Veh Technol, 2019, 68(3): 2526-2534. DOI: 10.1109/TVT.2019.2893615. Cited in this article [1]

[9]	PICZAK K J. Recognizing bird species in audio recordings using deep convolutional neural networks[C]// CLEF (working notes). Evora, Portugal, 2016: 534-543. Cited in this article [1]

[10]	BAI J, CHEN C, CHEN J. Xception based method for bird sound recognition of BirdCLEF 2020 [C]// CLEF (working notes). Thessaloniki, Greece, 2020. Cited in this article [1]

[11]	LIU Z, CHEN W, CHEN A, et al. Birdsong classification based on multi feature channel fusion[J]. Multimed Tools Appl, 2022, 81(11): 15469-15490. DOI: 10.1007/s11042-022-12570-3. Cited in this article [1]

[12]	LIU J, ZHANG Y, LV D, et al. Birdsong classification based on ensemble multi-scale convolutional neural network[J]. Sci Rep, 2022, 12: 8636. DOI: 10.1038/s41598-022-12121-8. Cited in this article [4]

[13]	XIE J, ZHU M. Handcrafted features and late fusion with deep learning for bird sound classification[J]. Ecol Inform, 2019, 52: 74-81. DOI: 10.1016/j.ecoinf.2019.05.007. Cited in this article [1]

[14]	YAN N, CHEN A, ZHOU G, et al. Birdsong classification based on multi-feature fusion[J]. Multimed Tools Appl, 2021, 80(30): 36529-36547. DOI: 10.1007/s11042-021-11396-9. Cited in this article [1]

[15]	SUN S. A survey of multi-view machine learning[J]. Neural Comput & Applic, 2013, 23(7): 2031-2038. DOI: 10.1007/s00521-013-1362-6. Cited in this article [1]

[16]	ZHAO J, XIE X, XU X, et al. Multi-view learning overview:recent progress and new challenges[J]. Inf Fusion, 2017, 38: 43-54. DOI: 10.1016/j.inffus.2017.02.007. Cited in this article [1]

[17]	HOTELLING H. Relations between two sets of variates[M]// Springer Series in Statistics. New York: Springer, 1992: 162-190. DOI: 10.1007/978-1-4612-4380-9_14. Cited in this article [1]

[18]	BLUM A, MITCHELL T. Combining labeled and unlabeled data with co-training[C]// Proceedings of the Eleventh Annual Conference on Computational Learning Theory-COLT'98, New York, USA: ACM Press, 1998: 92-100. DOI: 10.1145/279943.279962. Cited in this article [1]

[19]	DASARATHY B V, SHEELA B V. A composite classifier system design: concepts and methodology[J]. Proc IEEE, 1979, 67(5): 708-713. DOI: 10.1109/PROC.1979.11321. Cited in this article [1]

[20]	YANG Z, WANG H, HAN Y, et al. Discriminative multi-task multi-view feature selection and fusion for multimedia analysis[J]. Multimed Tools Appl, 2018, 77(3): 3431-3453. DOI: 10.1007/s11042-017-5165-0. Cited in this article [1]

[21]	XIE M, YE Z, PAN G, et al. Incomplete multi-view subspace clustering with adaptive instance-sample mapping and deep feature fusion[J]. Appl Intell, 2021, 51(8): 5584-5597. DOI: 10.1007/s10489-020-02138-9. Cited in this article [1]

[22]	TAO W, LEU M, YIN Z. American sign language alphabet recognition using Convolutional Neural Networks with multiview augmentation and inference fusion[J]. Eng Appl Artif Intell, 2018, 76: 202-213. DOI: 10.1016/j.engappai.2018.09.006. Cited in this article [1]

[23]

谢将剑, 李文彬, 张军国, 等. 基于Chirplet语图特征和深度学习的鸟类物种识别方法[J]. 北京林业大学学报, 2018, 40(3): 122-127.

XIE

J J

, LI

W B

, ZHANG

J G

, et al. Bird species recognition method based on Chirplet spectrogram feature and deep learning[J]. J Beijing For Univ, 2018, 40(3): 122-127. DOI: 10.13332/j.1000-1522.20180008.

Cited in this article [1]

[24]	GRIFFIN D, LIM J. Signal estimation from modified short-time Fourier transform[J]. IEEE Trans Acoust Speech Signal Process, 1984, 32(2): 236-243. DOI: 10.1109/TASSP.1984.1164317. Cited in this article [1]

[25]	NAGARAJAIAH S, BASU B. Output only modal identification and structural damage detection using time frequency & wavelet techniques[J]. Earthq Eng Eng Vib, 2009, 8(4): 583-605. DOI: 10.1007/s11803-009-9120-6. Cited in this article [1]

[26]	HUANG N, WU Z. A review on Hilbert-Huang transform: Method and its applications to geophysical studies[J]. Rev Geophys, 2008, 46(2): RG2006. DOI: 10.1029/2007rg000228. Cited in this article [1]

[27]	HUANG N, WU M, QU W, et al. Applications of Hilbert-Huang transform to non-stationary financial time series analysis[J]. Appl Stochastic Models Bus Ind, 2003, 19(3): 245-268. DOI: 10.1002/asmb.501. Cited in this article [1]

[28]	WU Z, HUANG N. Ensemble empirical mode decomposition: a noise-assisted data analysis method[J]. Adv Adapt Data Anal, 2009, 1(1): 1-41. DOI: 10.1142/s1793536909000047. Cited in this article [1]

[29]	LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proc IEEE, 1998, 86(11): 2278-2324. DOI: 10.1109/5.726791. Cited in this article [1]

[30]	XU R, ZHOU S, LI W. MEMS accelerometer based nonspecific-user hand gesture recognition[J]. IEEE Sens J, 2012, 12(5): 1166-1173. DOI: 10.1109/JSEN.2011.2166953. Cited in this article [1]

[31]

张华美, 张皎洁. 基于人工智能的脱机手写数字识别研究综述[J]. 南京邮电大学学报(自然科学版), 2021, 41(5): 83-91.

ZHANG

H M

, ZHANG

J J

. Summary of offline handwritten digit recognition research based on artificial intelligence[J]. J Nanjing Univ Posts Telecommun (Nat Sci Ed), 2021, 41(5): 83-91. DOI: 10.14132/j.cnki.1673-5439.2021.05.012.

Cited in this article [1]

[32]	BREIMAN L. Bagging predictors[J]. Mach Learn, 1996, 24(2): 123-140. DOI: 10.1023/A:1018054314350. Cited in this article [1]

[33]	WOLPERT D H. Stacked generalization[J]. Neural Netw, 1992, 5(2): 241-259. DOI: 10.1016/S0893-6080(05)80023-1. Cited in this article [1]

[34]

LIU

, ZHANG

, LV

, et al. Classification method of birdsong based on Gabor_WT feature image and convolutional neural network[C]// 4th International Conference on Pattern Recognition and Artificial Intelligence (PRAI). IEEE, 2021: 134-140. DOI: 10.1109/PRAI53619.2021.9551079.

Cited in this article [1]

PDF(1924 KB)

Accesses

Citation

Detail

Sections

Recommended

The full text is translated into English by AI, aiming to facilitate reading and comprehension. The core content is subject to the explanation in Chinese.

www.nldxb.njfu.edu.cn

Edited ＆ Published by Editorial Department of Nanjing Forestry University(Natural Sciences Edition)
Address： No.159 Longpan Road,Nanjing 210037 Jiangsu,P.R.China
E-mail ：xuebao@njfu.edu.cn , xuebao@njfu.com.cn
Telephone +86-25-85428247,85427076
Distributed by China International Book Trading Corporation P.O. Box 399, Beijing ,P.R. China

〈

〉

Received	Revised	Published
2022-08-20	2023-02-17	2023-07-30
Issue Date
2023-07-20

Please choose a citation manager

Content to export