基于多视图集成的鸟鸣分类研究

刘江, 张雁, 吕丹桔, 鲁静, 谢珊珊, 子佳丽, 陈旭, 赵友杰

南京林业大学学报(自然科学版) ›› 2023, Vol. 47 ›› Issue (4) : 23-30.

PDF(1924 KB)
PDF(1924 KB)
南京林业大学学报(自然科学版) ›› 2023, Vol. 47 ›› Issue (4) : 23-30. DOI: 10.12302/j.issn.1000-2006.202208043
专题报道:第三届中国林草计算机应用大会论文精选(Ⅱ)(执行主编 李凤日)

基于多视图集成的鸟鸣分类研究

作者信息 +

Birdsong classification research based on multi-view ensembles

Author information +
文章历史 +

摘要

【目的】尝试融合多视图特征来最大化特征信息,提出多视图级联集成卷积神经网络(MVC-CNN)鸟鸣音分类方法,构建泛化性较强的鸟鸣分类模型,以促进鸟类物种多样性保护和生态环境智能监测的深入研究。【方法】以16种鸟鸣音频数据为研究对象,采用短时傅里叶变换(STFT)、小波变换(WT)和希尔伯-特黄变换(HHT)等特征提取方法生成鸟鸣音的3类谱图以构成多视图特征数据,并作为卷积神经网络(CNN)的输入,训练不同视图的基分类器STFT-CNN、WT-CNN和HHT-CNN;分别采用Bagging和Stacking集成方法构建了多视图Bagging集成卷积神经网络(MVB-CNN)模型和多视图Stacking集成卷积神经网络(MVS-CNN)模型。以CNN强大的特征提取能力,提出了多视图级联集成卷积神经网络(MVC-CNN)模型,将不同视图经CNN提取得到的深度特征级联融合,以支持向量机(SVM)为最终分类器获得分类结果。【结果】构建的基分类模型WT-CNN、STFT-CNN、HHT-CNN的准确率分别为89.11%、88.36%和81.00%;多视图集成模型MVB-CNN和MVS-CNN的准确率分别为89.92%和93.54%,多视图级联集成模型MVC-CNN的准确率为95.76%。MVC-CNN模型准确率比单一视图基分类模型提升6.65%~14.76%,比MVB-CNN和MVS-CNN提升5.84%和2.22%。【结论】研究提出的MVC-CNN模型能充分结合鸟鸣多视图特征的优势,有效提升鸟鸣分类效果,具有较高的稳定性和更好的泛化能力,为多视图鸟鸣音分类研究提供技术方案。

Abstract

【Objective】This study aimed to build a birdsong classification model with strong generalization integrating multi-view features and maximizing feature information to promote profound research on bird species diversity protection and ecological environmentally-intelligent monitoring.【Method】Using 16 types of birdsong audio data as the research objects, the short-time Fourier transform (STFT), wavelet transform (WT) and Hilbert-Huang transform (HHT) feature extraction methods were used to generate three types of birdsong spectrograms to constitute multi-view feature data, and as the input of the convolutional neural network (CNN), the base classifiers STFT-CNN, WT-CNN, and HHT-CNN for different views were trained. The multi-view bagging ensemble convolutional neural network (MVB-CNN) and multi-view stacking ensemble convolutional neural network (MVS-CNN) models were constructed using bagging and stacking integration methods, respectively. With the powerful feature extraction capability of CNN, the multi-view cascaded ensemble convolutional neural network (MVC-CNN) model was proposed to cascade and fuse the deep features extracted from different views through CNN. The classification results were obtained by using a support vector machine (SVM). 【Result】The accuracy rates of the base classification models WT-CNN, STFT-CNN, and HHT-CNN constructed in this study were 89.11%, 88.36%, and 81.00%, respectively; the accuracy rates of the ensemble models MVB-CNN and MVS-CNN were 89.92% and 93.54%, respectively; and the accuracy of the multi-view cascade ensemble model MVC-CNN was 95.76%. The accuracy of the MVC-CNN model improved by 6.65%-14.76% over the single-view-based classification model and by 5.84% and 2.22% over the MVB-CNN and MVS-CNN models, respectively.【Conclusion】The MVC-CNN model proposed in this study fully combined the advantages of multi-view features of birdsong, effectively improving the birdsong classification effects with a greater stability and better generalizational ability, providing a technical solution for multi-view birdsong classification researches.

关键词

特征提取 / 多视图 / 集成学习 / 卷积神经网络

Key words

feature extraction / multi-view / ensemble learning / convolutional neural network

引用本文

导出引用
刘江, 张雁, 吕丹桔, . 基于多视图集成的鸟鸣分类研究[J]. 南京林业大学学报(自然科学版). 2023, 47(4): 23-30 https://doi.org/10.12302/j.issn.1000-2006.202208043
LIU Jiang, ZHANG Yan, LYU Danju, et al. Birdsong classification research based on multi-view ensembles[J]. JOURNAL OF NANJING FORESTRY UNIVERSITY. 2023, 47(4): 23-30 https://doi.org/10.12302/j.issn.1000-2006.202208043
中图分类号: TN18;S718   

参考文献

[1]
胡耀文. 音频信号特征提取及其分类研究[D]. 昆明: 昆明理工大学, 2018.
HU Y W. Research on feature extraction and classification of audio signal[D]. Kunming: Kunming University of Science and Technology, 2018.
[2]
XIA C W, HUANG R, WEI C C, et al. Individual identification on the basis of the songs of the Asian Stubtail (Urosphena squameiceps)[J]. Chin Birds, 2011, 2(3): 132-139. DOI: 10.5122/cbirds.2011.0024.
[3]
马克平. 多样性监测依赖于地面人工观测与先进技术手段的有机结合[J]. 生物多样性, 2016, (11): 1201-1202.
MA K P. Biodiversity monitoring relies on the integration of human observation and automatic collection of data with advanced equipment and facilities[J]. Biodiver Sci, 2016, 24(11): 1201-1202. DOI: 10.17520/biods.2016343.
[4]
EMLEN J T, DEJONG M J. Counting birds: the problem of variable hearing abilities[J]. Journal of Field Ornithology, 1992, 63(1): 26-31.
[5]
MURUGAIYA R, ABAS P E, DE SILVA L C. Probability enhanced entropy (PEE) novel feature for improved bird sound classification[J]. Mach Intell Res, 2022, 19(1): 52-62. DOI: 10.1007/s11633-022-1318-3.
[6]
XIE S, ZHANG Y, LV D, et al. Birdsongs recognition based on ensemble ELM with multi-strategy differential evolution[J]. Sci Rep, 2022, 12(1): 9739. DOI: 10.1038/s41598-022-13957-w.
[7]
陈瀚翔, 邱志斌, 王海祥, 等. 基于MFCC特征与GMM的输电线路渉鸟故障相关鸟种智能识别[J]. 水电能源科学, 2021, 39(7): 171-174, 67.
CHEN H X, QIU Z B, WANG H X, et al. Intelligent identification of bird species related to transmission line faults based on MFCC features and GMM[J]. Water Resour Power, 2021, 39(7): 171-174, 67. DOI: 10.26914/c.cnkihy.2022.001440.
[8]
ANWAR M Z, KALEEM Z, JAMALIPOUR A. Machine learning inspired sound-based amateur drone detection for public safety applications[J]. IEEE Trans Veh Technol, 2019, 68(3): 2526-2534. DOI: 10.1109/TVT.2019.2893615.
[9]
PICZAK K J. Recognizing bird species in audio recordings using deep convolutional neural networks[C]// CLEF (working notes). Evora, Portugal, 2016: 534-543.
[10]
BAI J, CHEN C, CHEN J. Xception based method for bird sound recognition of BirdCLEF 2020 [C]// CLEF (working notes). Thessaloniki, Greece, 2020.
[11]
LIU Z, CHEN W, CHEN A, et al. Birdsong classification based on multi feature channel fusion[J]. Multimed Tools Appl, 2022, 81(11): 15469-15490. DOI: 10.1007/s11042-022-12570-3.
[12]
LIU J, ZHANG Y, LV D, et al. Birdsong classification based on ensemble multi-scale convolutional neural network[J]. Sci Rep, 2022, 12: 8636. DOI: 10.1038/s41598-022-12121-8.
[13]
XIE J, ZHU M. Handcrafted features and late fusion with deep learning for bird sound classification[J]. Ecol Inform, 2019, 52: 74-81. DOI: 10.1016/j.ecoinf.2019.05.007.
[14]
YAN N, CHEN A, ZHOU G, et al. Birdsong classification based on multi-feature fusion[J]. Multimed Tools Appl, 2021, 80(30): 36529-36547. DOI: 10.1007/s11042-021-11396-9.
[15]
SUN S. A survey of multi-view machine learning[J]. Neural Comput & Applic, 2013, 23(7): 2031-2038. DOI: 10.1007/s00521-013-1362-6.
[16]
ZHAO J, XIE X, XU X, et al. Multi-view learning overview:recent progress and new challenges[J]. Inf Fusion, 2017, 38: 43-54. DOI: 10.1016/j.inffus.2017.02.007.
[17]
HOTELLING H. Relations between two sets of variates[M]// Springer Series in Statistics. New York: Springer, 1992: 162-190. DOI: 10.1007/978-1-4612-4380-9_14.
[18]
BLUM A, MITCHELL T. Combining labeled and unlabeled data with co-training[C]// Proceedings of the Eleventh Annual Conference on Computational Learning Theory-COLT'98, New York, USA: ACM Press, 1998: 92-100. DOI: 10.1145/279943.279962.
[19]
DASARATHY B V, SHEELA B V. A composite classifier system design: concepts and methodology[J]. Proc IEEE, 1979, 67(5): 708-713. DOI: 10.1109/PROC.1979.11321.
[20]
YANG Z, WANG H, HAN Y, et al. Discriminative multi-task multi-view feature selection and fusion for multimedia analysis[J]. Multimed Tools Appl, 2018, 77(3): 3431-3453. DOI: 10.1007/s11042-017-5165-0.
[21]
XIE M, YE Z, PAN G, et al. Incomplete multi-view subspace clustering with adaptive instance-sample mapping and deep feature fusion[J]. Appl Intell, 2021, 51(8): 5584-5597. DOI: 10.1007/s10489-020-02138-9.
[22]
TAO W, LEU M, YIN Z. American sign language alphabet recognition using Convolutional Neural Networks with multiview augmentation and inference fusion[J]. Eng Appl Artif Intell, 2018, 76: 202-213. DOI: 10.1016/j.engappai.2018.09.006.
[23]
谢将剑, 李文彬, 张军国, 等. 基于Chirplet语图特征和深度学习的鸟类物种识别方法[J]. 北京林业大学学报, 2018, 40(3): 122-127.
XIE J J, LI W B, ZHANG J G, et al. Bird species recognition method based on Chirplet spectrogram feature and deep learning[J]. J Beijing For Univ, 2018, 40(3): 122-127. DOI: 10.13332/j.1000-1522.20180008.
[24]
GRIFFIN D, LIM J. Signal estimation from modified short-time Fourier transform[J]. IEEE Trans Acoust Speech Signal Process, 1984, 32(2): 236-243. DOI: 10.1109/TASSP.1984.1164317.
[25]
NAGARAJAIAH S, BASU B. Output only modal identification and structural damage detection using time frequency & wavelet techniques[J]. Earthq Eng Eng Vib, 2009, 8(4): 583-605. DOI: 10.1007/s11803-009-9120-6.
[26]
HUANG N, WU Z. A review on Hilbert-Huang transform: Method and its applications to geophysical studies[J]. Rev Geophys, 2008, 46(2): RG2006. DOI: 10.1029/2007rg000228.
[27]
HUANG N, WU M, QU W, et al. Applications of Hilbert-Huang transform to non-stationary financial time series analysis[J]. Appl Stochastic Models Bus Ind, 2003, 19(3): 245-268. DOI: 10.1002/asmb.501.
[28]
WU Z, HUANG N. Ensemble empirical mode decomposition: a noise-assisted data analysis method[J]. Adv Adapt Data Anal, 2009, 1(1): 1-41. DOI: 10.1142/s1793536909000047.
[29]
LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proc IEEE, 1998, 86(11): 2278-2324. DOI: 10.1109/5.726791.
[30]
XU R, ZHOU S, LI W. MEMS accelerometer based nonspecific-user hand gesture recognition[J]. IEEE Sens J, 2012, 12(5): 1166-1173. DOI: 10.1109/JSEN.2011.2166953.
[31]
张华美, 张皎洁. 基于人工智能的脱机手写数字识别研究综述[J]. 南京邮电大学学报(自然科学版), 2021, 41(5): 83-91.
ZHANG H M, ZHANG J J. Summary of offline handwritten digit recognition research based on artificial intelligence[J]. J Nanjing Univ Posts Telecommun (Nat Sci Ed), 2021, 41(5): 83-91. DOI: 10.14132/j.cnki.1673-5439.2021.05.012.
[32]
BREIMAN L. Bagging predictors[J]. Mach Learn, 1996, 24(2): 123-140. DOI: 10.1023/A:1018054314350.
[33]
WOLPERT D H. Stacked generalization[J]. Neural Netw, 1992, 5(2): 241-259. DOI: 10.1016/S0893-6080(05)80023-1.
[34]
LIU J, ZHANG Y, LV D, et al. Classification method of birdsong based on Gabor_WT feature image and convolutional neural network[C]// 4th International Conference on Pattern Recognition and Artificial Intelligence (PRAI). IEEE, 2021: 134-140. DOI: 10.1109/PRAI53619.2021.9551079.

基金

云南省重大科技专项(202002AA10007)
国家自然科学基金项目(61462078)
国家自然科学基金项目(31860332)
云南省教育厅科学研究基金项目(2021Y219)
云南省教育厅科学研究基金项目(2022Y558)

编辑: 王国栋
PDF(1924 KB)

Accesses

Citation

Detail

段落导航
相关文章

/