基于多视图集成的鸟鸣分类研究

刘江; 张雁; 吕丹桔; 鲁静; 谢珊珊; 子佳丽; 陈旭; 赵友杰

doi:10.12302/j.issn.1000-2006.202208043

基于多视图集成的鸟鸣分类研究

刘江, 张雁, 吕丹桔, 鲁静, 谢珊珊, 子佳丽, 陈旭, 赵友杰

南京林业大学学报（自然科学版） ›› 2023, Vol. 47 ›› Issue (4) : 23-30.

PDF(1924 KB)

国家林草科技领军期刊
中国精品科技期刊
中国高校百佳科技期刊
江苏省新闻出版政府奖期刊奖
RCCSE林学权威期刊（A+）
CSCD核心期刊
Scopus数据库收录期刊
中文核心期刊
SCD核心期刊

作者加群：102861116

微信公众号：南京林业大学学报

高级检索

PDF(1924 KB)

南京林业大学学报（自然科学版） ›› 2023, Vol. 47 ›› Issue (4) : 23-30. DOI: 10.12302/j.issn.1000-2006.202208043

专题报道：第三届中国林草计算机应用大会论文精选（Ⅱ）（执行主编李凤日）

基于多视图集成的鸟鸣分类研究

刘江 ¹ ,
张雁 ²^,^* ,
吕丹桔 ³^,^* ,
鲁静 ³ ,
谢珊珊 ³ ,
子佳丽 ³ ,
陈旭 ³ ,
赵友杰 ³

作者信息 +

Birdsong classification research based on multi-view ensembles

LIU Jiang ¹ ,
ZHANG Yan ²^,^* ,
LYU Danju ³^,^* ,
LU Jing ³ ,
XIE Shanshan ³ ,
ZI Jiali ³ ,
CHEN Xu ³ ,
ZHAO Youjie ³

Author information +

文章历史 +

摘要

【目的】尝试融合多视图特征来最大化特征信息,提出多视图级联集成卷积神经网络(MVC-CNN)鸟鸣音分类方法,构建泛化性较强的鸟鸣分类模型,以促进鸟类物种多样性保护和生态环境智能监测的深入研究。【方法】以16种鸟鸣音频数据为研究对象,采用短时傅里叶变换(STFT)、小波变换(WT)和希尔伯-特黄变换(HHT)等特征提取方法生成鸟鸣音的3类谱图以构成多视图特征数据,并作为卷积神经网络(CNN)的输入,训练不同视图的基分类器STFT-CNN、WT-CNN和HHT-CNN;分别采用Bagging和Stacking集成方法构建了多视图Bagging集成卷积神经网络(MVB-CNN)模型和多视图Stacking集成卷积神经网络(MVS-CNN)模型。以CNN强大的特征提取能力,提出了多视图级联集成卷积神经网络(MVC-CNN)模型,将不同视图经CNN提取得到的深度特征级联融合,以支持向量机(SVM)为最终分类器获得分类结果。【结果】构建的基分类模型WT-CNN、STFT-CNN、HHT-CNN的准确率分别为89.11%、88.36%和81.00%;多视图集成模型MVB-CNN和MVS-CNN的准确率分别为89.92%和93.54%,多视图级联集成模型MVC-CNN的准确率为95.76%。MVC-CNN模型准确率比单一视图基分类模型提升6.65%~14.76%,比MVB-CNN和MVS-CNN提升5.84%和2.22%。【结论】研究提出的MVC-CNN模型能充分结合鸟鸣多视图特征的优势,有效提升鸟鸣分类效果,具有较高的稳定性和更好的泛化能力,为多视图鸟鸣音分类研究提供技术方案。

Abstract

【Objective】This study aimed to build a birdsong classification model with strong generalization integrating multi-view features and maximizing feature information to promote profound research on bird species diversity protection and ecological environmentally-intelligent monitoring.【Method】Using 16 types of birdsong audio data as the research objects, the short-time Fourier transform (STFT), wavelet transform (WT) and Hilbert-Huang transform (HHT) feature extraction methods were used to generate three types of birdsong spectrograms to constitute multi-view feature data, and as the input of the convolutional neural network (CNN), the base classifiers STFT-CNN, WT-CNN, and HHT-CNN for different views were trained. The multi-view bagging ensemble convolutional neural network (MVB-CNN) and multi-view stacking ensemble convolutional neural network (MVS-CNN) models were constructed using bagging and stacking integration methods, respectively. With the powerful feature extraction capability of CNN, the multi-view cascaded ensemble convolutional neural network (MVC-CNN) model was proposed to cascade and fuse the deep features extracted from different views through CNN. The classification results were obtained by using a support vector machine (SVM). 【Result】The accuracy rates of the base classification models WT-CNN, STFT-CNN, and HHT-CNN constructed in this study were 89.11%, 88.36%, and 81.00%, respectively; the accuracy rates of the ensemble models MVB-CNN and MVS-CNN were 89.92% and 93.54%, respectively; and the accuracy of the multi-view cascade ensemble model MVC-CNN was 95.76%. The accuracy of the MVC-CNN model improved by 6.65%-14.76% over the single-view-based classification model and by 5.84% and 2.22% over the MVB-CNN and MVS-CNN models, respectively.【Conclusion】The MVC-CNN model proposed in this study fully combined the advantages of multi-view features of birdsong, effectively improving the birdsong classification effects with a greater stability and better generalizational ability, providing a technical solution for multi-view birdsong classification researches.

导出引用

刘江, 张雁, 吕丹桔, 等. 基于多视图集成的鸟鸣分类研究[J]. 南京林业大学学报（自然科学版）. 2023, 47(4): 23-30 https://doi.org/10.12302/j.issn.1000-2006.202208043

LIU Jiang, ZHANG Yan, LYU Danju, et al. Birdsong classification research based on multi-view ensembles[J]. JOURNAL OF NANJING FORESTRY UNIVERSITY. 2023, 47(4): 23-30 https://doi.org/10.12302/j.issn.1000-2006.202208043

中图分类号： TN18;S718

参考文献

列表( 原文顺序 | 文献年度倒序 | 文中引用次数倒序 ) 可视化分析

[1]	胡耀文. 音频信号特征提取及其分类研究[D]. 昆明: 昆明理工大学, 2018. HU Y W. Research on feature extraction and classification of audio signal[D]. Kunming: Kunming University of Science and Technology, 2018. 本文引用 [1]

[2]	XIA C W, HUANG R, WEI C C, et al. Individual identification on the basis of the songs of the Asian Stubtail (Urosphena squameiceps)[J]. Chin Birds, 2011, 2(3): 132-139. DOI: 10.5122/cbirds.2011.0024. 本文引用 [1]

[3]	马克平. 多样性监测依赖于地面人工观测与先进技术手段的有机结合[J]. 生物多样性, 2016, (11): 1201-1202. MA K P. Biodiversity monitoring relies on the integration of human observation and automatic collection of data with advanced equipment and facilities[J]. Biodiver Sci, 2016, 24(11): 1201-1202. DOI: 10.17520/biods.2016343. 本文引用 [1]

[4]	EMLEN J T, DEJONG M J. Counting birds: the problem of variable hearing abilities[J]. Journal of Field Ornithology, 1992, 63(1): 26-31. 本文引用 [1]

[5]	MURUGAIYA R, ABAS P E, DE SILVA L C. Probability enhanced entropy (PEE) novel feature for improved bird sound classification[J]. Mach Intell Res, 2022, 19(1): 52-62. DOI: 10.1007/s11633-022-1318-3. 本文引用 [2]

[6]	XIE S, ZHANG Y, LV D, et al. Birdsongs recognition based on ensemble ELM with multi-strategy differential evolution[J]. Sci Rep, 2022, 12(1): 9739. DOI: 10.1038/s41598-022-13957-w. 本文引用 [3]

[7]

陈瀚翔, 邱志斌, 王海祥, 等. 基于MFCC特征与GMM的输电线路渉鸟故障相关鸟种智能识别[J]. 水电能源科学, 2021, 39(7): 171-174, 67.

CHEN

H X

, QIU

Z B

, WANG

H X

, et al. Intelligent identification of bird species related to transmission line faults based on MFCC features and GMM[J]. Water Resour Power, 2021, 39(7): 171-174, 67. DOI: 10.26914/c.cnkihy.2022.001440.

本文引用 [2]

[8]	ANWAR M Z, KALEEM Z, JAMALIPOUR A. Machine learning inspired sound-based amateur drone detection for public safety applications[J]. IEEE Trans Veh Technol, 2019, 68(3): 2526-2534. DOI: 10.1109/TVT.2019.2893615. 本文引用 [1]

[9]	PICZAK K J. Recognizing bird species in audio recordings using deep convolutional neural networks[C]// CLEF (working notes). Evora, Portugal, 2016: 534-543. 本文引用 [1]

[10]	BAI J, CHEN C, CHEN J. Xception based method for bird sound recognition of BirdCLEF 2020 [C]// CLEF (working notes). Thessaloniki, Greece, 2020. 本文引用 [1]

[11]	LIU Z, CHEN W, CHEN A, et al. Birdsong classification based on multi feature channel fusion[J]. Multimed Tools Appl, 2022, 81(11): 15469-15490. DOI: 10.1007/s11042-022-12570-3. 本文引用 [1]

[12]	LIU J, ZHANG Y, LV D, et al. Birdsong classification based on ensemble multi-scale convolutional neural network[J]. Sci Rep, 2022, 12: 8636. DOI: 10.1038/s41598-022-12121-8. 本文引用 [4]

[13]	XIE J, ZHU M. Handcrafted features and late fusion with deep learning for bird sound classification[J]. Ecol Inform, 2019, 52: 74-81. DOI: 10.1016/j.ecoinf.2019.05.007. 本文引用 [1]

[14]	YAN N, CHEN A, ZHOU G, et al. Birdsong classification based on multi-feature fusion[J]. Multimed Tools Appl, 2021, 80(30): 36529-36547. DOI: 10.1007/s11042-021-11396-9. 本文引用 [1]

[15]	SUN S. A survey of multi-view machine learning[J]. Neural Comput & Applic, 2013, 23(7): 2031-2038. DOI: 10.1007/s00521-013-1362-6. 本文引用 [1]

[16]	ZHAO J, XIE X, XU X, et al. Multi-view learning overview:recent progress and new challenges[J]. Inf Fusion, 2017, 38: 43-54. DOI: 10.1016/j.inffus.2017.02.007. 本文引用 [1]

[17]	HOTELLING H. Relations between two sets of variates[M]// Springer Series in Statistics. New York: Springer, 1992: 162-190. DOI: 10.1007/978-1-4612-4380-9_14. 本文引用 [1]

[18]	BLUM A, MITCHELL T. Combining labeled and unlabeled data with co-training[C]// Proceedings of the Eleventh Annual Conference on Computational Learning Theory-COLT'98, New York, USA: ACM Press, 1998: 92-100. DOI: 10.1145/279943.279962. 本文引用 [1]

[19]	DASARATHY B V, SHEELA B V. A composite classifier system design: concepts and methodology[J]. Proc IEEE, 1979, 67(5): 708-713. DOI: 10.1109/PROC.1979.11321. 本文引用 [1]

[20]	YANG Z, WANG H, HAN Y, et al. Discriminative multi-task multi-view feature selection and fusion for multimedia analysis[J]. Multimed Tools Appl, 2018, 77(3): 3431-3453. DOI: 10.1007/s11042-017-5165-0. 本文引用 [1]

[21]	XIE M, YE Z, PAN G, et al. Incomplete multi-view subspace clustering with adaptive instance-sample mapping and deep feature fusion[J]. Appl Intell, 2021, 51(8): 5584-5597. DOI: 10.1007/s10489-020-02138-9. 本文引用 [1]

[22]	TAO W, LEU M, YIN Z. American sign language alphabet recognition using Convolutional Neural Networks with multiview augmentation and inference fusion[J]. Eng Appl Artif Intell, 2018, 76: 202-213. DOI: 10.1016/j.engappai.2018.09.006. 本文引用 [1]

[23]	谢将剑, 李文彬, 张军国, 等. 基于Chirplet语图特征和深度学习的鸟类物种识别方法[J]. 北京林业大学学报, 2018, 40(3): 122-127. XIE J J, LI W B, ZHANG J G, et al. Bird species recognition method based on Chirplet spectrogram feature and deep learning[J]. J Beijing For Univ, 2018, 40(3): 122-127. DOI: 10.13332/j.1000-1522.20180008. 本文引用 [1]

[24]	GRIFFIN D, LIM J. Signal estimation from modified short-time Fourier transform[J]. IEEE Trans Acoust Speech Signal Process, 1984, 32(2): 236-243. DOI: 10.1109/TASSP.1984.1164317. 本文引用 [1]

[25]	NAGARAJAIAH S, BASU B. Output only modal identification and structural damage detection using time frequency & wavelet techniques[J]. Earthq Eng Eng Vib, 2009, 8(4): 583-605. DOI: 10.1007/s11803-009-9120-6. 本文引用 [1]

[26]	HUANG N, WU Z. A review on Hilbert-Huang transform: Method and its applications to geophysical studies[J]. Rev Geophys, 2008, 46(2): RG2006. DOI: 10.1029/2007rg000228. 本文引用 [1]

[27]	HUANG N, WU M, QU W, et al. Applications of Hilbert-Huang transform to non-stationary financial time series analysis[J]. Appl Stochastic Models Bus Ind, 2003, 19(3): 245-268. DOI: 10.1002/asmb.501. 本文引用 [1]

[28]	WU Z, HUANG N. Ensemble empirical mode decomposition: a noise-assisted data analysis method[J]. Adv Adapt Data Anal, 2009, 1(1): 1-41. DOI: 10.1142/s1793536909000047. 本文引用 [1]

[29]	LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proc IEEE, 1998, 86(11): 2278-2324. DOI: 10.1109/5.726791. 本文引用 [1]

[30]	XU R, ZHOU S, LI W. MEMS accelerometer based nonspecific-user hand gesture recognition[J]. IEEE Sens J, 2012, 12(5): 1166-1173. DOI: 10.1109/JSEN.2011.2166953. 本文引用 [1]

[31]

张华美, 张皎洁. 基于人工智能的脱机手写数字识别研究综述[J]. 南京邮电大学学报(自然科学版), 2021, 41(5): 83-91.

ZHANG

H M

, ZHANG

J J

. Summary of offline handwritten digit recognition research based on artificial intelligence[J]. J Nanjing Univ Posts Telecommun (Nat Sci Ed), 2021, 41(5): 83-91. DOI: 10.14132/j.cnki.1673-5439.2021.05.012.

本文引用 [1]

[32]	BREIMAN L. Bagging predictors[J]. Mach Learn, 1996, 24(2): 123-140. DOI: 10.1023/A:1018054314350. 本文引用 [1]

[33]	WOLPERT D H. Stacked generalization[J]. Neural Netw, 1992, 5(2): 241-259. DOI: 10.1016/S0893-6080(05)80023-1. 本文引用 [1]

[34]	LIU J, ZHANG Y, LV D, et al. Classification method of birdsong based on Gabor_WT feature image and convolutional neural network[C]// 4th International Conference on Pattern Recognition and Artificial Intelligence (PRAI). IEEE, 2021: 134-140. DOI: 10.1109/PRAI53619.2021.9551079. 本文引用 [1]