ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACAD
1
PaperAuthorTitle
publisher
keywords
year
sound classification applications/use cases
algorithms
preprocessing
Feature Extraction
steps/process
Denoising techniques
Tools used
Context awareness in sound classification
design/model/framework
setup images
field tests
results
challenges/limitations
Accuracy Levels
Other Information
Classifies
formula
graphs
Algorithm/Flowchart
Architecture
Pseudocode
Network/Component Diagram
2
ACM1
HONG R. et. al
Video Accessibility Enhancement for Hearing-Impaired Users
ACM
Accessibility, dynamic captioning, hearing impairment
2011
Dynamic Captioning
Viola Jones, Haar feature based cascade mouth detector
script location, script-speech alignment, and voice volume estimation
videos along with scripts but can be extended to process general videos without scripts
dynamic captioning put scripts at suitable positions to help the hearing-impaired audience better recognize the speakers
script location, script-speech alignment, and voice volume estimation
Video Feeds
better tracking of the scripts and perceive the moods that are conveyed by the variation of volume
Focus more on dynamic captioning rather than the user interface
80
Speech in video to dynamic caption
Y-Gaussian distance, linear representation
y
Y-accessibility enhancement & script-speech alignment
face mapping
3
ACM2
Wang W. et. al
A Smartphone-based Digital Hearing Aid to Mitigate Hearing Loss at Specific Frequencies
ACM
Digital hearing aids, smartphone, sound classification
2014
Hearing Loss of certain frequencies among elderly
GMM Classifier, WOLA (Weighted Over-Lap Add) filter bank
speech processing in the frequency domain and sound classification to classify input sounds into speech and speech with noise categories.
WOLA filter banks then split the sound up into different frequency bands, which are then amplified (reduced) by the amplification in the specific frequency ranges at which
the user’s hearing is impaired. Finally, the WOLA synthesis filter bank reconstructs the acoustic signal from the amplified sub-band signals, which is sent to the receiver for play
out.
Smartphones
script location, script-speech alignment, and voice volume estimation
acoustic signals
frequency domain processing is currently a bit slow due to computational complexity
Audio Frequencies
YYY
hearing aid app (Application and Storage)
4
ACM3
Bountourakis V. et. al
Machine Learning Algorithms for Environmental Sound Recognition: Towards Soundscape Semantics
ACM
Environmental Sound Recognition, audio classification, semantic audio analysis, computer audition, feature extraction, feature selection, machine learning algorithms
2015
Comparison between algorithms for sound classification
• k-Nearest Neighbors (k-NN)
• Naive Bayes
• Support Vector Machines (SVM)
• C4.5 algorithm (decision tree)
• Logistic Regression
• Artificial Neural Networks (ANN)
stationary (frequency-based) feature extraction and non-stationary (time frequency based) feature extraction
database, segmentation, feature extraction, feature selection, classification, evaluation
Environmental Sounds
database, segmentation, feature extraction, feature selection, classification, evaluation
Sound Signals
the highest classification rates were achieved by k-NN with feature set 3 (85.8%), ANN with feature set 2 and use of PCA (86.95%) and SVM with feature set 2 and use of PCA (85.41%).
airplanes, alarms, applause, birds, dogs, footsteps, motorcycles, rain, rivers, sea waves, thunders, wind.
5
ACM4
Bragg D. et. al
A Personalizable Mobile Sound Detector App Design for Deaf and Hard-of-Hearing Users
ACM
Sound detection, accessibility, deaf, hard-of-hearing
2016
deaf and hard-of-hearing people to be notified about sounds around them
Smartphones, mobile application
Accesibility
Sound Signals
87 participants (51 female, 36 male). 50 were deaf, and 37 were hard-of-hearing. Ages ranged 18-99 (mean 42, std dev 17).
app design to be usable for deaf and hard-of-hearing users recording training examples of sound
70
No participants used apps to monitor sounds outside of the study
participants revealed they wanted classifications of dropping items, walking/running behind, moving carts, fire drill, printer, conversations, and baby sound.
vehicles passing by, children having bad dreams, smoke and carbon monoxide detectors, app
pliances making unusual noises, water running, socializing,
something dropping on the floor, gunshots, conversations,
and distinguishing between multiple sources with similar frequency range.
Users train the system using the sounds at home
6
ACM6
Kurnaz S. & Aljabery M.
Predict the type of hearing aid of audiology patients using data mining techniques
ACM
audiology, National Health System, audiograms, BTE, ITE, Machine Learning, Data Mining, Hearing Aid.
2018
Choice of hearing aid
AdaBoost classifier, Random forests classification, Logistic Regression, Orange Canvas Modeler
Hearing aid choice
ML Model
7
ACM7
Li M. et. al
Environmental Noise Classification Using Convolution Neural Networks
ACM
Environmental noise; Convolution Neural Network (CNN); Short-Time Fourier Transform (STFT); Log Mel-Frequency Spectral Coefficients (MFSCs); Tensorflow
2018CNN
Short-Time Fourier Transform (STFT
Environment
Log Mel-Frequency Spectral Coefficients
Y ML, STFT
CNN
8
ACM8
Alsouda Y. et. al.
IoT-based Urban Noise Identification Using Machine Learning: Performance of SVM, KNN, Bagging, and Random Forest
ACM
urban noise; smart cities; support vector machine (SVM); k-nearest neighbors (KNN); bootstrap aggregation (Bagging); random forest; mel-frequency cepstral coefficients (MFCC); internet of things (IoT).
2019
classification of environmental sounds
SVM, KNN, Bagging, Random Forest
mel-frequency cepstral coefficients (MFCC)
Feature extraction, model training, classifier, prediction
Raspberry pi, Microphone hat
Environment
Feature extraction, model training, classifier, prediction
Sound
high noise identification accuracy that is in the range 88% – 94%. E
Classifier SVM KNN Bagging Random Forest Accuracy [%]
93.87 93.88 87.81 89.91
quietness, silence, car horn, children playing, gunshot, jackhammer, siren, and street music
K-Nearest Neighbors
ML, MFCC
9
ACM9
Wang . et. al.
Privacy-aware environmental sound classification for indoor human activity recognition
ACM
Smart Buildings, Privacy-aware Environmental Sound Recognition, Voice Bands Stripping, Internet Of Things, Computational Efficiency, Web Crawling, Mel Frequency Cepstral Coefficients, Linear Predictive Cepstral Coefficients, Support Vector Machine
2019
indoor environmental sound classification
Decision tree, Random Forest, Mixed Gaussian, Naive Bayes, SVM(Linear & RBF kernel), Artificial Neural Network
Environment
0.9Y
ML, Feature extraction
10
ACM10
Inik O. & Seker H
Convolutional Neural Networks for the Classification of Environmental Sounds
ACM
Environmental sound classification (ESC), Deep Learning, Convo�lutional Neural Networks (CNN), Urbansound8k
2020
classification of environmental sounds
CNN
Intel®Core™ i9-7900X 3.30GHz×20 processor, 64 GB Ram and 2 x GeForce RTX2080Ti graphic card. Matlab R2020a 64bit (win64)
Environment
0.825
e air conditioner, car horn, children playing, dog bark, drilling, engine idling, gun shot, jackhammer, siren, and street music
CNN
11
ACM11
Sigtia S. et. al.
Automatic Environmental Sound Recognition: Performance Versus Computational Cost
ACM
Automatic environmental sound recognition, computational auditory scene analysis, deep learning, machine learning.
2016
classification of environmental sounds
Gaussian Mixture Models, SVM, DNN, RNN
Mel-frequency cepstral coefficient (MFCC)
Baby Cry Data Set, Smoke Alarm Data Set
Environment
Deep Neural Networks yield the best ratio of sound classification accuracy across a range of computational costs, while Gaussian Mixture Models offer a reasonable accuracy at a consis�tently small cost, and Support Vector Machines stand between both in terms of compromise between accuracy and computational cost.
smoke alarms and baby cries
Gaussian Mixture Models, SVM, DNN (Feed forward) RNN
Y
12
ACM12
Laar V. & Vries. B
A Probabilistic Modeling Approach to Hearing Loss Compensation
ACM
Hearing aids, hearing loss compensation, probabilistic modeling, factor graphs, message passing, machine learning
2016
Hearing Aid (HA) algorithms tuning, probabilistic modeling approach
to the design of HA algorithms
Bayes factor (BF)
signal processing (SP), parameter estimation (PE) and model comparison (MC) tasks evaluation
Speech Understanding
performance evaluation, signal processing
Y
hearting aid signal processing
hearing aid agent
13
ACM13
Salehi H. et. al.
Learning-Based Reference-Free Speech Quality Measures for Hearing Aid Applications
ACM
Hearing aids, speech quality, perceptual linear prediction, gammatone filterbank energies, reference-free quality assessment, support vector regression, machine learning.
2018
Speech quality of hearing aids
Speech Understanding
A group of 18 HI listeners were recruited to provide the speech quality ratings
Linear prediction
Feature extraction
14
IEEE1
Demir F. et. al
A New Deep CNN Model for Environmental Sound Classification
IEEE
Environmental sound classification, spectrogram images, CNN model, deep features
2020
Environmental sound classification
CNN
spectrogram method converts the signals into time frequency images or loudness of a signal over time at different frequencies existing in a specific waveform
deep feature extraction
DCASE-2017 ASC and the UrbanSound8K datasets
Environment
86.7
air conditioner, car horn, children, dog bark drilling, engine idling, gun shot, jack�hammer, siren, and street music
STFT, CNN, Accuracy
Y
ML, CNN, KNN
15
IEEE2
Ridha. A & Shehieb W.
Assistive Technology for Hearing-Impaired and Deaf Students Utilizing Augmented Reality
IEEE
Assistive technology; Augmented Reality; Deaf; Education; Hearing-Impairment; Machine Learning.
2021
augmented reality glasses that will assist students in their educational journey with real-time transcribing, speech emotion recognition, sound indications features, as well as classroom assistive tools.
AR Glasses
Environmental Sounds
live transcription feature that uses Google Cloud services, additionally storing the transcribed lectures for future reference in the classroom tools feature, that can also be shared among other students, making it a platform that can be used for communication between students
71.3
Car Horn, Siren, Gunshots, Broken Glass
ML
PCB Schematic
16
IEEE3
Melati A.& Karyono K.
ANDROID BASED SOUND DETECTION APPLICATION FOR HEARING-IMPAIRED USING ADABOOSTM1 CLASSIFIER WITH REPTREE WEAKLEARNER
IEEE
sound detection for hearing-impaired, machine learning, AdaBoostM1, REPTree, Android
2014
help the hearing-impaired people to detect sound around them and to recognize the sound
AdaBoostM1 functioning as a classifier and REPTree as weak learner
indoor sounds and the second database is outdoor sounds with a total of 23 sounds
Environment
Low Accuracy, propose better approach
40
baby crying x beep x broom sweeps x door creaking x door slam x door bell x foot step x hairdryer x knocking door x ringing x water runs x whistle
airplanes
x applause
x birds chirp
x car honk
x crowded
x dog bark
x engine start
x screaming
x thunder
x train
x wind blowing
YML
AdaBoostM, Bagging
17
IEEE4
Chen C. et. al.
Audio-Based Early Warning System of Sound Events on the Road for Improving the Safety of Hearing-Impaired People
IEEE
Android application, warning, audio detection, machine learning
2019
Road Safety for hearing impared
(CNNs)MFCC
urbansound 8k
Safety
CNN is effective for environment sounds classification tasks by appropriate parameter settings and feature sets
66.4
Car-approaching, Car-horn, Children-playing, Dog-barking, Gun-shot, Construction, Siren, Engine-idling
MFCCYML
18
IEEE5
Bhat G. et. al.
Automated machine learning based speech classification for hearing aid applications and its real-time implementation on smartphone
IEEE
Automated Machine Learning, AutoML, Voice Activity Detection (VAD), Hearing aid devices (HADs), smartphone, real-time
2020
speech classification
AutoML based VAD, CNN
Speech Understanding
Speech
Signal model and training feature
YML
19
IEEE6
Healy E. & Yoho S.
Difficulty understanding speech in noise by the hearing impaired: Underlying causes and technological solutions
IEEE2016
poor speech understanding
single-microphone algorithm to extract speech from noise, DNN
Speech Understanding
Groups of 10 NH and 10 HI subjects heard IEEE sentences in unprocessed speech-plus-noise conditions and corresponding algorithm-processed conditions. In this study, multi-talker babble and cafeteria noise, each at two SNRs, were employed.
SpeechML
20
IEEE7
Jatturas C. et. al.
Feature-based and Deep Learning-based Classification of Environmental Sound
IEEE2019
comparison techniques for environmental sound classification
SVM, MLP, Deep Learning
Urban Sound 8k, Scikit-learn and Tensorflow
Environment
75
Air cond., Children Playing, Engine Idling, Siren, and Street Music.,
STFT, NN, SVM
YCNN
21
IEEE8
Saleem N. et. al.
Machine Learning Approach for Improving the Intelligibility of Noisy Speech
IEEE
Machine learning, speech enhancement, intelligibility, time-frequency masking, deep neural networks
2020
Intelligibility of Noisy Speech
Speech Understanding
RNNYRNN
22
IEEE9
Jatturas C. et. al.
Recurrent Neural Networks for Environmental Sound Recognition using Scikit-learn and Tensorflow
IEEE2019
Environmental sound classification
MLP, SVM
MFCC
Urban Sound 8k,
Environment
deep neural network models outperform both MLP and SVM with PCA
90
Car-approaching, Car-horn, Children-playing, Dog-barking, Gun-shot, Construction, Siren, Engine-idling
STFT, SVM
YRNN
23
IEEE10
Davis. N & Suresh. K
Environmental Sound Classification using Deep Convolutional Neural Networks and Data
IEEE2018
Environmental sound classification
Conditional Neural Network
Time Stretchnig, pitch shifting, Dynamic Range Compression, Background Noise, Linear Prediction Ceptal Coefficients (LPCC)
Urbansound 8K
Environment
80
air conditioner, car horns, children playing, dog bark, drilling, engine idling, gunshot, jackhammers, siren and street music
LPCCY
24
IEEE11
Chu. S et. al
Environmental Sound Recognition With Time–Frequency Audio Features
IEEE
Terms—Audio classification, auditory scene recognition, data representation, feature extraction, feature selection, matching pursuit, Mel-frequency cepstral coefficient (MFCC).
2009
Environmental sound classification
Environment
that Restaurant, Casino,Train, Rain, and Street ambulance
short time energy, zero crossing rate, signal decomposition
Y
25
IEEE12
Chu. S et. al
WHERE AM I? SCENE RECOGNITION FOR MOBILE ROBOTS USING AUDIO FEATURES
IEEE2006
Environmental sound classification
Environment
26
IEEE13
Ullo. S et. al
Hybrid Computerized Method for Environmental Sound Classification
IEEE
Environmental sound classification, Optimal allocation sampling, spectrogram, convolu�tional neural network, classification techniques
2020
Environmental sound classification
AlexNet and Visual Geometry Group (VGG)-16 networks
decision tree (fine, medium, coarse kernel),
k-nearest neighbor (fine, medium, cosine, cubic, coarse and weighted kernel), support vector machine,
linear discriminant analysis, bagged tree and softmax classifiers
short-time Fourier transform (STFT)
Deep Feature Extraction
ESC-10, a ten-class environmental sound dataset,
The experiments have
been carried out on MATLAB (2018R). A computer with
8 GB RAM, intel i7 third generation processor of 3.4 GHz,
64 bit memory has been used.
Environment
AlexNet (FC-6) for fine kernel using a decision tree is 89.9%
90.1%, 95.8%, 94.7%, 87.9%, 95.6%, and 92.4% is obtained with a decision tree, k-neared neighbor, support vector machine, linear discriminant analysis, bagged tree and softmax classifier respectively
The methods proposed until now by the researchers have been limited in terms of performance. Hence an effective and robust method is required to classify environmental signals accurately. In the present work, authors aim to propose a method in which the dimension of data is reduced by OAS. The reduced data are then used to be transformed into images by STFT. Several features have been extracted from the spectrograms by using two pre-trained CNNs.
Classes from dataset
STFT, Sample Size,
YMLCNN
27
IEEE14
Zhang X. et. al
Dilated Convolution Neural Network with LeakyReLU for Environmental Sound Classification
IEEE
Environmental sound classification ; Dilated Convolution Neural Network; Leaky Rectified Linear Unit; Activation Function
2017
Environmental sound classification
a dilated CNN-based ESC (D-CNN-ESC)
transforming acoustic waves to low level feature vectors following commonly used method
UrbanSound8K, ESC50, and CICESE
Environment
proposed D�CNN-ESC system outperforms the state-of-the-art ESC results obtained by very deep CNN-ESC system on UrbanSound8K dataset, the absolute error of our method is about 10% less than that of compared method.
All classes in the 3 datasets
YMLCNN
28
IEEE15
Han B. & Hwang E.
ENVIRONMENTAL SOUND CLASSIFICATION BASED ON FEATURE COLLABORATION
IEEE
Environmental sound recognition, discrete chirplet transform, discrete curvelet transform, discrete Hilbert transform, feature extraction
2009
Environmental sound classification
SVM
We then applied equal-loudness level contours to each frame, to ensure that the signal more accurately represented human sound perception, and we eliminated the silence signal from the start and end points of each frame.
For traditional features, we collected mel-frequency cepstal coefficients (MFCC), zero-crossing rate (ZCR), spectral centroid (SC), spectral spread (SS), spectral flatness (SF), and spectral flux (SFX).
Environment
CDFs and ATFs are more effective than TFs for classification. Furthermore, when combined with TFs, they achieved the maximum accuracy.
three types of features: traditional features (TFs), change detection features (CDFs), and acoustic texture features (ATFs)
Street, road, talking, raining,bar, car
Hilbert transform, discrete chirplet transform
Y
ML, Feature extraction
29
IEEE16
Wang J. et. al
Environmental Sound Classification Using Hybrid SVM/KNN Classifier and MPEG-7 Audio Low-Level Descriptor
IEEE2006
Environmental sound classification
Hybrid SVM/KNN
Audio Spectrum Centroid, Audio Spectrum Spread, Audio Spectrum Flatness
Environment
the proposed hybrid SVM/KNN classifier outperforms the HMM classifier in MPEG-7 sound recognition tool
male speech (50), female speech (50), cough (50), laughing (49), screaming (26), dog barking (50), cat mewing (45), frog wailing (50), piano (40), glass breaking (34), gun shooting (33), and knocking (50). There are totally 527 sound files in our database
SVM, KNN, Feature extraction (Audio Spectrum Centroid, spread and flatness)
ML
30
IEEE17Piczak K.
ENVIRONMENTAL SOUND CLASSIFICATION WITH CONVOLUTIONAL NEURAL NETWORKS
IEEE
environmental sound, convolutional neu�ral networks, classification
2015
Environmental sound classification
CNN
ESC-50 and ESC-10, UrbanSound 8k
Environment
publicly available datasets of environmen�tal recordings are still very limited - both in number and in size1
UrbanSound8K dataset (LP - 73.1%, US - 73.7%)
Classes from dataset
ReLUYMLCNN
31
IEEE18
Salamon J. & Belo J.
Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification
IEEE
Environmental sound classification, deep convo�lutional neural networks, deep learning, urban sound datase
2016
Environmental sound classification
CNN
Environment
CNNY
32
IEEE19
Wang J. et. al
Gabor-Based Nonuniform Scale-Frequency Map for Environmental Sound Classification in Home Automation
IEEE
Environmental sound classification, feature extraction, Gabor function, home automation, matching pursuit (MP), nonuniform scale-frequency map
2013
Environmental sound classification
SVM
Gabor Dictionary Based on Critical Frequency Bands
Nonuniform Scale-Frequency Map
Dimensional Reduction of Scale-Frequency Maps Using Principal
Component Analysis and Linear Discriminant Analysis
Environment
proposed feature is more appro�priate for practical use, especially environmental sound classification, since the proposed method has higher robustness against noise.
0.8621
nonuniform scale frequency classifier
YML
33
IEEE20
Nayak D. et. al.
Machine Learning Models for the Hearing Impairment Prediction in Workers Exposed to Complex Industrial Noise: A Pilot Study
IEEE
Complex noise exposure, Hearing impairment, Machine learning, Noise-induced hearing loss
2018
Hearing Impairment Prediction in Workers Exposed to Complex Industrial Noise
Environment
Data sets were collected from 1,644 workers exposed to complex noises in 53 workshops of 17 factories in the Zhejiang province of China
78.6 and 80.1
34
IEEE21
Tokozume Y. & Harada T.
LEARNING ENVIRONMENTAL SOUNDS WITH END-TO-END CONVOLUTIONAL NEURAL NETWORK
IEEE
Environmental sound classification, convolu�tional neural network, end-to-end system, feature learning
2017ESC
We refer to our CNN as EnvNet
Urbansound 8K, YorNoise
Environment
81.3CNN
35
IEEE22
local binary pattern
36
IEEE23MLCNNML
37
ScienceDirect1
Nossier S. et. al.
Enhanced smart hearing aid using deep neural networks
SCIENCE DIRECT
Deep learning; Dropout; Noise of interest awareness; Smart hearing aid; Speech enhancement
2019
Smart hearing aid
DNN
Hearing aid
89Car hornNNML
38
ScienceDirect2
Abdoli et. al.
End-to-end environmental sound classification using a 1D convolutional neural network
SCIENCE DIRECT
Convolutional neural network Environmental sound classification Deep learning Gammatone filterbank
2019
Environemntal sound classification using 1D CNN
CNN
Environment
Sound
Feature extraction, MSE
YMLCNN
39
ScienceDirect3
Mushtaq Z & Su S.F
Environmental sound classification using a regularized deep convolutional neural network with data augmentation
SCIENCE DIRECT
Data augmentation Environmental sound classification Regularization Deep convolutional neural network Urbansound8k
2020
Environmental Sound Classification
DCNN
Environment
95.3CNNYMLCNN
40
ScienceDirect4
Chen Y. et. al.
Environmental sound classification with dilated convolutions
SCIENCE DIRECT
Sound information retrieval Environmental sound classification Dilated convolutions
2018
Sound signal retrieval
CNN
Sound retrieval
ReLU, Softmax value, cross entropy, CNN
YMLCNN
41
ScienceDirect6
Demir F et. al.
A new pyramidal concatenated CNN approach for environmental sound classification
SCIENCE DIRECT
Sound classification Deep learning SVM STFT CNN
2020
Environment sound classification
Deep learning CNN
Short Time Frontier Transform
VCGNet 16, VCGNet 19, DenseNet 201
Urbansound 8K, ESC - 10, ESC -50
Environment
Sound
94.8, 81.4, 78.1
Short Time Fourier Transform (STFT)
YML
42
ScienceDirect7
Mushtaq Z et. al.
Spectral images based environmental sound classification using CNN with meaningful data augmentation
SCIENCE DIRECT
Environmental sound classification Convolutional neural network Spectrogram Data augmentation Transfer learning
2021
pproach of spectral images based on environmental sound classification using Convolutional Neural Networks (CNN) with meaningful data augmentation
CNN
ESC -10, ESC -50, Urbansound 8K
Environment
Sound
99.04, 99.49, 97.57
CNNY
ML, Transfer Learning
43
ScienceDirect8
Ahmad S et. al.
Environmental sound classification using optimum allocation sampling based empirical mode decomposition
SCIENCE DIRECT
Environmental sound classification Optimum allocation sampling Empirical mode decomposition Multi-class least squares support vector machine Extreme learning machine
2020
Automatic environmet sound classification
Optimum allocation sampling
ESC - 10
Environment
Sound
87.25, 77.61
dog bark, rain, sea waves, baby cry, clock tick, person sneeze, helicopter, chainsaw, rooster, and fire crackling
Empirical Mode Decomposition , Feature extraction (Approximate Entropy, Permutation Entropy, Log-energy entropy, Zero Crossing Rate) SVM, NN
YML
44
ScienceDirect9
Mushtaq Z. & Su S.
Environmental sound classification using a regularized deep convolutional neural network with data augmentation
SCIENCE DIRECT
Data augmentation Environmental sound classification Regularization Deep convolutional neural network Urbansound8k ESC-10 ESC-50
2020ESCCNN
Mel-spectrogram (Mel), Mel-Frequency Cep�stral Coefficient (MFCC) and Log-Mel by using DCNN
ESC-10 ESC-50 US8K
94.9 89.2 95.3
YYY
45
Springer1
Medhat F. et. al.
Masked Conditional Neural Networks for Environmental Sound Classification
Springer
Conditional Neural Networks � CLNN � Masked Conditional Neural Networks � MCLNN � Restricted Boltzmann Machine, RBM � Conditional Restricted Boltzmann Machine � CRBM � Deep Belief Nets � Environmental Sound Recognition � ESR � YorNois
2017
Environmental sound classification
Conditional Neural Network
Urbansound 8K, YorNoise
Environment
73
air conditioner, car horns, children playing, dog bark, drilling, engine idling, gunshot, jackhammers, siren and street music
CNN, Feature extraction
YCNNCNN
46
Springer2
Zhang Z. et.al.
Deep Convolutional Neural Network with Mixup for Environmental Sound Classification
Springer
Environmental sound classification Convolutional neural network · Mixup
2018ESCCNN
ESC-10 dataset is a subset of 10 classes (400 samples), UrbanSound8K dataset is a collection of 8732 short (up to 4 s) audio
clips of urban sound areas
Environment
91.7 83.9 83.7
dog bark, rain, sea waves, baby cry, clock tick, person sneeze, helicopter, chainsaw, rooster, fire crackling,air conditioner, car horn, children playing,
dog bark, drilling, engine idling, gun shot, jackhammer, siren, and street music
we propose a novel CNN as our ESC system model inspired by VGG Net, . In order to achieve a better performance for
our system on ESC, the effect of mixup hyper-parameter α is further explored.
Figure 5 shows the change of accuracy with different α ranging from [0.1, 0.5].
We see that when α = 0.2, the best accuracy is achieved on all three datasets.
Generating training data
YML
47
INTERSPEECH1
Sailor B. et. al.
Unsupervised Filterbank Learning Using Convolutional Restricted Boltzmann Machine for Environmental Sound Classification
INTERSPEECH
Unsupervised Filterbank Learning, ConvRBM, Sound Classification, CNN
2017
Environmental sound classification
supervised Convolutional Neural Network (CNN)
ESC -50 dataset
Environment
Sound/Audio Signal
proposed ConvRBM-BANK outperform EnvNET [18] even without the system combination. this shows the significance of unsupervised generative training using ConvRBM
78.45
ConvRBM-BANK performs significantly better than CNN with FBEs
CNNYYCNN
48
INTERSPEECH2
Sharma J. et. al.
Environment Sound Classification using Multiple Feature Channels and Attention based Deep Convolutional Neural Networ
INTERSPEECH
Convolutional Neural Networks, Attention, Multiple Feature Channels, Environment Sound Classification
2020
Environmental sound classification
CNN
Mel-Frequency Cepstral Coeffi�cients (MFCC), Gammatone Frequency Cepstral Coefficients (GFCC), Constant Q-transform (CQT) and Chromagram
ESC-10 ESC-50 US8K
Environment
94.75(ESC-10) 87.45(ESC-50) 97.52(US8k)
We stop at 128 features, which pro�duces the best results, to avoid increasing the complexity of the model.
CNNNyCNN
49
eprint aRXIV
Mohaimenuzzaman Md. et. al.
Environmental Sound Classification on the Edge: A Pipeline for Deep Acoustic Networks on Extremely Resource-Constrained Devices
eprint aRXIV
Deep Learning, Audio Classification, Environmental Sound Classification, Acoustics, Intelligent Sound Recognition, Micro-Controller, IoT, Edge-AI, ESC-50
2021ESC
ACDNet, which produces above state-of-the-art accuracy on ESC-10 (96.65%) and ESC-50 (87.1%), we describe the com�pression pipeline and show that it allows us to achieve 97.22% size reduction and 97.28%
ACDNet is implemented in PyTorch version 1.7.1 and Wavio audio library is used to process the audio files. ESC-10 ESC-50 US8K
Environment
While limitations of the programming environment have restricted the accuracy of our current test deployment on a physical MCU, we have conclusively shown that 81.5% accuracy is achievable on such a resource-impoverished device, close to the state-of-the-art and above human performance.
96.65
it is likely that the performance can be improved by using quantization-aware training and pruning. Secondly, we would like to try the SpArSe approach for further optimisations now that we have developed Micro-ACDNet as a suitable starting point for its optimisations.
ACDNet, which produces above state-of-the-art accuracy on ESC-10 (96.65%) and ESC-50 (87.1%), we describe the com�pression pipeline and show that it allows us to achieve 97.22% size reduction and 97.28%
Training sample, Learning rate, prunning process
YCNN
Hybrid prunning
50
MDPI1
Dempter-Shafer Evidence Theory
MLCNN
51
crossover fitting, user preference matching, populating diversity
ML
Presentation, application, storage
52
MDPI3
Speech Enhancement, CNN
CNN
53
MDPI4
soundscaping, source mixing and source, modeling, STFT, posterior distribution
soundscaping, source mixing and source modeling
54
MDPI5
acoustic feedback signal, microphone, feedback and error signals, computational complexity
Hearing aid sructure/curcuit
55
MDPI6
56
MDPI7
Sampling frequency, Speech Quality Perception Evaluation
ML, Feature extraction, Evaluation process
denoising autoencoder networks
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100