A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | AA | AB | AC | AD | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Paper | Author | Title | publisher | keywords | year | sound classification applications/use cases | algorithms | preprocessing | Feature Extraction | steps/process | Denoising techniques | Tools used | Context awareness in sound classification | design/model/framework | setup images | field tests | results | challenges/limitations | Accuracy Levels | Other Information | Classifies | formula | graphs | Algorithm/Flowchart | Architecture | Pseudocode | Network/Component Diagram | ||
2 | ACM1 | HONG R. et. al | Video Accessibility Enhancement for Hearing-Impaired Users | ACM | Accessibility, dynamic captioning, hearing impairment | 2011 | Dynamic Captioning | Viola Jones, Haar feature based cascade mouth detector | script location, script-speech alignment, and voice volume estimation | videos along with scripts but can be extended to process general videos without scripts | dynamic captioning put scripts at suitable positions to help the hearing-impaired audience better recognize the speakers | script location, script-speech alignment, and voice volume estimation | Video Feeds | better tracking of the scripts and perceive the moods that are conveyed by the variation of volume | Focus more on dynamic captioning rather than the user interface | 80 | Speech in video to dynamic caption | Y-Gaussian distance, linear representation | y | Y-accessibility enhancement & script-speech alignment | face mapping | |||||||||
3 | ACM2 | Wang W. et. al | A Smartphone-based Digital Hearing Aid to Mitigate Hearing Loss at Specific Frequencies | ACM | Digital hearing aids, smartphone, sound classification | 2014 | Hearing Loss of certain frequencies among elderly | GMM Classifier, WOLA (Weighted Over-Lap Add) filter bank | speech processing in the frequency domain and sound classification to classify input sounds into speech and speech with noise categories. | WOLA filter banks then split the sound up into different frequency bands, which are then amplified (reduced) by the amplification in the specific frequency ranges at which the user’s hearing is impaired. Finally, the WOLA synthesis filter bank reconstructs the acoustic signal from the amplified sub-band signals, which is sent to the receiver for play out. | Smartphones | script location, script-speech alignment, and voice volume estimation | acoustic signals | frequency domain processing is currently a bit slow due to computational complexity | Audio Frequencies | Y | Y | Y | hearing aid app (Application and Storage) | |||||||||||
4 | ACM3 | Bountourakis V. et. al | Machine Learning Algorithms for Environmental Sound Recognition: Towards Soundscape Semantics | ACM | Environmental Sound Recognition, audio classification, semantic audio analysis, computer audition, feature extraction, feature selection, machine learning algorithms | 2015 | Comparison between algorithms for sound classification | • k-Nearest Neighbors (k-NN) • Naive Bayes • Support Vector Machines (SVM) • C4.5 algorithm (decision tree) • Logistic Regression • Artificial Neural Networks (ANN) | stationary (frequency-based) feature extraction and non-stationary (time frequency based) feature extraction | database, segmentation, feature extraction, feature selection, classification, evaluation | Environmental Sounds | database, segmentation, feature extraction, feature selection, classification, evaluation | Sound Signals | the highest classification rates were achieved by k-NN with feature set 3 (85.8%), ANN with feature set 2 and use of PCA (86.95%) and SVM with feature set 2 and use of PCA (85.41%). | airplanes, alarms, applause, birds, dogs, footsteps, motorcycles, rain, rivers, sea waves, thunders, wind. | |||||||||||||||
5 | ACM4 | Bragg D. et. al | A Personalizable Mobile Sound Detector App Design for Deaf and Hard-of-Hearing Users | ACM | Sound detection, accessibility, deaf, hard-of-hearing | 2016 | deaf and hard-of-hearing people to be notified about sounds around them | Smartphones, mobile application | Accesibility | Sound Signals | 87 participants (51 female, 36 male). 50 were deaf, and 37 were hard-of-hearing. Ages ranged 18-99 (mean 42, std dev 17). | app design to be usable for deaf and hard-of-hearing users recording training examples of sound | 70 | No participants used apps to monitor sounds outside of the study | participants revealed they wanted classifications of dropping items, walking/running behind, moving carts, fire drill, printer, conversations, and baby sound. | vehicles passing by, children having bad dreams, smoke and carbon monoxide detectors, app pliances making unusual noises, water running, socializing, something dropping on the floor, gunshots, conversations, and distinguishing between multiple sources with similar frequency range. | Users train the system using the sounds at home | |||||||||||||
6 | ACM6 | Kurnaz S. & Aljabery M. | Predict the type of hearing aid of audiology patients using data mining techniques | ACM | audiology, National Health System, audiograms, BTE, ITE, Machine Learning, Data Mining, Hearing Aid. | 2018 | Choice of hearing aid | AdaBoost classifier, Random forests classification, Logistic Regression, Orange Canvas Modeler | Hearing aid choice | ML Model | ||||||||||||||||||||
7 | ACM7 | Li M. et. al | Environmental Noise Classification Using Convolution Neural Networks | ACM | Environmental noise; Convolution Neural Network (CNN); Short-Time Fourier Transform (STFT); Log Mel-Frequency Spectral Coefficients (MFSCs); Tensorflow | 2018 | CNN | Short-Time Fourier Transform (STFT | Environment | Log Mel-Frequency Spectral Coefficients | Y ML, STFT | CNN | ||||||||||||||||||
8 | ACM8 | Alsouda Y. et. al. | IoT-based Urban Noise Identification Using Machine Learning: Performance of SVM, KNN, Bagging, and Random Forest | ACM | urban noise; smart cities; support vector machine (SVM); k-nearest neighbors (KNN); bootstrap aggregation (Bagging); random forest; mel-frequency cepstral coefficients (MFCC); internet of things (IoT). | 2019 | classification of environmental sounds | SVM, KNN, Bagging, Random Forest | mel-frequency cepstral coefficients (MFCC) | Feature extraction, model training, classifier, prediction | Raspberry pi, Microphone hat | Environment | Feature extraction, model training, classifier, prediction | Sound | high noise identification accuracy that is in the range 88% – 94%. E | Classifier SVM KNN Bagging Random Forest Accuracy [%] 93.87 93.88 87.81 89.91 | quietness, silence, car horn, children playing, gunshot, jackhammer, siren, and street music | K-Nearest Neighbors | ML, MFCC | |||||||||||
9 | ACM9 | Wang . et. al. | Privacy-aware environmental sound classification for indoor human activity recognition | ACM | Smart Buildings, Privacy-aware Environmental Sound Recognition, Voice Bands Stripping, Internet Of Things, Computational Efficiency, Web Crawling, Mel Frequency Cepstral Coefficients, Linear Predictive Cepstral Coefficients, Support Vector Machine | 2019 | indoor environmental sound classification | Decision tree, Random Forest, Mixed Gaussian, Naive Bayes, SVM(Linear & RBF kernel), Artificial Neural Network | Environment | 0.9 | Y | ML, Feature extraction | ||||||||||||||||||
10 | ACM10 | Inik O. & Seker H | Convolutional Neural Networks for the Classification of Environmental Sounds | ACM | Environmental sound classification (ESC), Deep Learning, Convo�lutional Neural Networks (CNN), Urbansound8k | 2020 | classification of environmental sounds | CNN | Intel®Core™ i9-7900X 3.30GHz×20 processor, 64 GB Ram and 2 x GeForce RTX2080Ti graphic card. Matlab R2020a 64bit (win64) | Environment | 0.825 | e air conditioner, car horn, children playing, dog bark, drilling, engine idling, gun shot, jackhammer, siren, and street music | CNN | |||||||||||||||||
11 | ACM11 | Sigtia S. et. al. | Automatic Environmental Sound Recognition: Performance Versus Computational Cost | ACM | Automatic environmental sound recognition, computational auditory scene analysis, deep learning, machine learning. | 2016 | classification of environmental sounds | Gaussian Mixture Models, SVM, DNN, RNN | Mel-frequency cepstral coefficient (MFCC) | Baby Cry Data Set, Smoke Alarm Data Set | Environment | Deep Neural Networks yield the best ratio of sound classification accuracy across a range of computational costs, while Gaussian Mixture Models offer a reasonable accuracy at a consis�tently small cost, and Support Vector Machines stand between both in terms of compromise between accuracy and computational cost. | smoke alarms and baby cries | Gaussian Mixture Models, SVM, DNN (Feed forward) RNN | Y | |||||||||||||||
12 | ACM12 | Laar V. & Vries. B | A Probabilistic Modeling Approach to Hearing Loss Compensation | ACM | Hearing aids, hearing loss compensation, probabilistic modeling, factor graphs, message passing, machine learning | 2016 | Hearing Aid (HA) algorithms tuning, probabilistic modeling approach to the design of HA algorithms | Bayes factor (BF) | signal processing (SP), parameter estimation (PE) and model comparison (MC) tasks evaluation | Speech Understanding | performance evaluation, signal processing | Y | hearting aid signal processing | hearing aid agent | ||||||||||||||||
13 | ACM13 | Salehi H. et. al. | Learning-Based Reference-Free Speech Quality Measures for Hearing Aid Applications | ACM | Hearing aids, speech quality, perceptual linear prediction, gammatone filterbank energies, reference-free quality assessment, support vector regression, machine learning. | 2018 | Speech quality of hearing aids | Speech Understanding | A group of 18 HI listeners were recruited to provide the speech quality ratings | Linear prediction | Feature extraction | |||||||||||||||||||
14 | IEEE1 | Demir F. et. al | A New Deep CNN Model for Environmental Sound Classification | IEEE | Environmental sound classification, spectrogram images, CNN model, deep features | 2020 | Environmental sound classification | CNN | spectrogram method converts the signals into time frequency images or loudness of a signal over time at different frequencies existing in a specific waveform deep feature extraction | DCASE-2017 ASC and the UrbanSound8K datasets | Environment | 86.7 | air conditioner, car horn, children, dog bark drilling, engine idling, gun shot, jack�hammer, siren, and street music | STFT, CNN, Accuracy | Y | ML, CNN, KNN | ||||||||||||||
15 | IEEE2 | Ridha. A & Shehieb W. | Assistive Technology for Hearing-Impaired and Deaf Students Utilizing Augmented Reality | IEEE | Assistive technology; Augmented Reality; Deaf; Education; Hearing-Impairment; Machine Learning. | 2021 | augmented reality glasses that will assist students in their educational journey with real-time transcribing, speech emotion recognition, sound indications features, as well as classroom assistive tools. | AR Glasses | Environmental Sounds | live transcription feature that uses Google Cloud services, additionally storing the transcribed lectures for future reference in the classroom tools feature, that can also be shared among other students, making it a platform that can be used for communication between students | 71.3 | Car Horn, Siren, Gunshots, Broken Glass | ML | PCB Schematic | ||||||||||||||||
16 | IEEE3 | Melati A.& Karyono K. | ANDROID BASED SOUND DETECTION APPLICATION FOR HEARING-IMPAIRED USING ADABOOSTM1 CLASSIFIER WITH REPTREE WEAKLEARNER | IEEE | sound detection for hearing-impaired, machine learning, AdaBoostM1, REPTree, Android | 2014 | help the hearing-impaired people to detect sound around them and to recognize the sound | AdaBoostM1 functioning as a classifier and REPTree as weak learner | indoor sounds and the second database is outdoor sounds with a total of 23 sounds | Environment | Low Accuracy, propose better approach | 40 | baby crying x beep x broom sweeps x door creaking x door slam x door bell x foot step x hairdryer x knocking door x ringing x water runs x whistle airplanes x applause x birds chirp x car honk x crowded x dog bark x engine start x screaming x thunder x train x wind blowing | Y | ML | AdaBoostM, Bagging | ||||||||||||||
17 | IEEE4 | Chen C. et. al. | Audio-Based Early Warning System of Sound Events on the Road for Improving the Safety of Hearing-Impaired People | IEEE | Android application, warning, audio detection, machine learning | 2019 | Road Safety for hearing impared | (CNNs) | MFCC | urbansound 8k | Safety | CNN is effective for environment sounds classification tasks by appropriate parameter settings and feature sets | 66.4 | Car-approaching, Car-horn, Children-playing, Dog-barking, Gun-shot, Construction, Siren, Engine-idling | MFCC | Y | ML | |||||||||||||
18 | IEEE5 | Bhat G. et. al. | Automated machine learning based speech classification for hearing aid applications and its real-time implementation on smartphone | IEEE | Automated Machine Learning, AutoML, Voice Activity Detection (VAD), Hearing aid devices (HADs), smartphone, real-time | 2020 | speech classification | AutoML based VAD, CNN | Speech Understanding | Speech | Signal model and training feature | Y | ML | |||||||||||||||||
19 | IEEE6 | Healy E. & Yoho S. | Difficulty understanding speech in noise by the hearing impaired: Underlying causes and technological solutions | IEEE | 2016 | poor speech understanding | single-microphone algorithm to extract speech from noise, DNN | Speech Understanding | Groups of 10 NH and 10 HI subjects heard IEEE sentences in unprocessed speech-plus-noise conditions and corresponding algorithm-processed conditions. In this study, multi-talker babble and cafeteria noise, each at two SNRs, were employed. | Speech | ML | |||||||||||||||||||
20 | IEEE7 | Jatturas C. et. al. | Feature-based and Deep Learning-based Classification of Environmental Sound | IEEE | 2019 | comparison techniques for environmental sound classification | SVM, MLP, Deep Learning | Urban Sound 8k, Scikit-learn and Tensorflow | Environment | 75 | Air cond., Children Playing, Engine Idling, Siren, and Street Music., | STFT, NN, SVM | Y | CNN | ||||||||||||||||
21 | IEEE8 | Saleem N. et. al. | Machine Learning Approach for Improving the Intelligibility of Noisy Speech | IEEE | Machine learning, speech enhancement, intelligibility, time-frequency masking, deep neural networks | 2020 | Intelligibility of Noisy Speech | Speech Understanding | RNN | Y | RNN | |||||||||||||||||||
22 | IEEE9 | Jatturas C. et. al. | Recurrent Neural Networks for Environmental Sound Recognition using Scikit-learn and Tensorflow | IEEE | 2019 | Environmental sound classification | MLP, SVM | MFCC | Urban Sound 8k, | Environment | deep neural network models outperform both MLP and SVM with PCA | 90 | Car-approaching, Car-horn, Children-playing, Dog-barking, Gun-shot, Construction, Siren, Engine-idling | STFT, SVM | Y | RNN | ||||||||||||||
23 | IEEE10 | Davis. N & Suresh. K | Environmental Sound Classification using Deep Convolutional Neural Networks and Data | IEEE | 2018 | Environmental sound classification | Conditional Neural Network | Time Stretchnig, pitch shifting, Dynamic Range Compression, Background Noise, Linear Prediction Ceptal Coefficients (LPCC) | Urbansound 8K | Environment | 80 | air conditioner, car horns, children playing, dog bark, drilling, engine idling, gunshot, jackhammers, siren and street music | LPCC | Y | ||||||||||||||||
24 | IEEE11 | Chu. S et. al | Environmental Sound Recognition With Time–Frequency Audio Features | IEEE | Terms—Audio classification, auditory scene recognition, data representation, feature extraction, feature selection, matching pursuit, Mel-frequency cepstral coefficient (MFCC). | 2009 | Environmental sound classification | Environment | that Restaurant, Casino,Train, Rain, and Street ambulance | short time energy, zero crossing rate, signal decomposition | Y | |||||||||||||||||||
25 | IEEE12 | Chu. S et. al | WHERE AM I? SCENE RECOGNITION FOR MOBILE ROBOTS USING AUDIO FEATURES | IEEE | 2006 | Environmental sound classification | Environment | |||||||||||||||||||||||
26 | IEEE13 | Ullo. S et. al | Hybrid Computerized Method for Environmental Sound Classification | IEEE | Environmental sound classification, Optimal allocation sampling, spectrogram, convolu�tional neural network, classification techniques | 2020 | Environmental sound classification | AlexNet and Visual Geometry Group (VGG)-16 networks decision tree (fine, medium, coarse kernel), k-nearest neighbor (fine, medium, cosine, cubic, coarse and weighted kernel), support vector machine, linear discriminant analysis, bagged tree and softmax classifiers | short-time Fourier transform (STFT) | Deep Feature Extraction | ESC-10, a ten-class environmental sound dataset, The experiments have been carried out on MATLAB (2018R). A computer with 8 GB RAM, intel i7 third generation processor of 3.4 GHz, 64 bit memory has been used. | Environment | AlexNet (FC-6) for fine kernel using a decision tree is 89.9% | 90.1%, 95.8%, 94.7%, 87.9%, 95.6%, and 92.4% is obtained with a decision tree, k-neared neighbor, support vector machine, linear discriminant analysis, bagged tree and softmax classifier respectively | The methods proposed until now by the researchers have been limited in terms of performance. Hence an effective and robust method is required to classify environmental signals accurately. In the present work, authors aim to propose a method in which the dimension of data is reduced by OAS. The reduced data are then used to be transformed into images by STFT. Several features have been extracted from the spectrograms by using two pre-trained CNNs. | Classes from dataset | STFT, Sample Size, | Y | ML | CNN | ||||||||||
27 | IEEE14 | Zhang X. et. al | Dilated Convolution Neural Network with LeakyReLU for Environmental Sound Classification | IEEE | Environmental sound classification ; Dilated Convolution Neural Network; Leaky Rectified Linear Unit; Activation Function | 2017 | Environmental sound classification | a dilated CNN-based ESC (D-CNN-ESC) | transforming acoustic waves to low level feature vectors following commonly used method | UrbanSound8K, ESC50, and CICESE | Environment | proposed D�CNN-ESC system outperforms the state-of-the-art ESC results obtained by very deep CNN-ESC system on UrbanSound8K dataset, the absolute error of our method is about 10% less than that of compared method. | All classes in the 3 datasets | Y | ML | CNN | ||||||||||||||
28 | IEEE15 | Han B. & Hwang E. | ENVIRONMENTAL SOUND CLASSIFICATION BASED ON FEATURE COLLABORATION | IEEE | Environmental sound recognition, discrete chirplet transform, discrete curvelet transform, discrete Hilbert transform, feature extraction | 2009 | Environmental sound classification | SVM | We then applied equal-loudness level contours to each frame, to ensure that the signal more accurately represented human sound perception, and we eliminated the silence signal from the start and end points of each frame. | For traditional features, we collected mel-frequency cepstal coefficients (MFCC), zero-crossing rate (ZCR), spectral centroid (SC), spectral spread (SS), spectral flatness (SF), and spectral flux (SFX). | Environment | CDFs and ATFs are more effective than TFs for classification. Furthermore, when combined with TFs, they achieved the maximum accuracy. | three types of features: traditional features (TFs), change detection features (CDFs), and acoustic texture features (ATFs) | Street, road, talking, raining,bar, car | Hilbert transform, discrete chirplet transform | Y | ML, Feature extraction | |||||||||||||
29 | IEEE16 | Wang J. et. al | Environmental Sound Classification Using Hybrid SVM/KNN Classifier and MPEG-7 Audio Low-Level Descriptor | IEEE | 2006 | Environmental sound classification | Hybrid SVM/KNN | Audio Spectrum Centroid, Audio Spectrum Spread, Audio Spectrum Flatness | Environment | the proposed hybrid SVM/KNN classifier outperforms the HMM classifier in MPEG-7 sound recognition tool | male speech (50), female speech (50), cough (50), laughing (49), screaming (26), dog barking (50), cat mewing (45), frog wailing (50), piano (40), glass breaking (34), gun shooting (33), and knocking (50). There are totally 527 sound files in our database | SVM, KNN, Feature extraction (Audio Spectrum Centroid, spread and flatness) | ML | |||||||||||||||||
30 | IEEE17 | Piczak K. | ENVIRONMENTAL SOUND CLASSIFICATION WITH CONVOLUTIONAL NEURAL NETWORKS | IEEE | environmental sound, convolutional neu�ral networks, classification | 2015 | Environmental sound classification | CNN | ESC-50 and ESC-10, UrbanSound 8k | Environment | publicly available datasets of environmen�tal recordings are still very limited - both in number and in size1 | UrbanSound8K dataset (LP - 73.1%, US - 73.7%) | Classes from dataset | ReLU | Y | ML | CNN | |||||||||||||
31 | IEEE18 | Salamon J. & Belo J. | Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification | IEEE | Environmental sound classification, deep convo�lutional neural networks, deep learning, urban sound datase | 2016 | Environmental sound classification | CNN | Environment | CNN | Y | |||||||||||||||||||
32 | IEEE19 | Wang J. et. al | Gabor-Based Nonuniform Scale-Frequency Map for Environmental Sound Classification in Home Automation | IEEE | Environmental sound classification, feature extraction, Gabor function, home automation, matching pursuit (MP), nonuniform scale-frequency map | 2013 | Environmental sound classification | SVM | Gabor Dictionary Based on Critical Frequency Bands Nonuniform Scale-Frequency Map Dimensional Reduction of Scale-Frequency Maps Using Principal Component Analysis and Linear Discriminant Analysis | Environment | proposed feature is more appro�priate for practical use, especially environmental sound classification, since the proposed method has higher robustness against noise. | 0.8621 | nonuniform scale frequency classifier | Y | ML | |||||||||||||||
33 | IEEE20 | Nayak D. et. al. | Machine Learning Models for the Hearing Impairment Prediction in Workers Exposed to Complex Industrial Noise: A Pilot Study | IEEE | Complex noise exposure, Hearing impairment, Machine learning, Noise-induced hearing loss | 2018 | Hearing Impairment Prediction in Workers Exposed to Complex Industrial Noise | Environment | Data sets were collected from 1,644 workers exposed to complex noises in 53 workshops of 17 factories in the Zhejiang province of China | 78.6 and 80.1 | ||||||||||||||||||||
34 | IEEE21 | Tokozume Y. & Harada T. | LEARNING ENVIRONMENTAL SOUNDS WITH END-TO-END CONVOLUTIONAL NEURAL NETWORK | IEEE | Environmental sound classification, convolu�tional neural network, end-to-end system, feature learning | 2017 | ESC | We refer to our CNN as EnvNet | Urbansound 8K, YorNoise | Environment | 81.3 | CNN | ||||||||||||||||||
35 | IEEE22 | local binary pattern | ||||||||||||||||||||||||||||
36 | IEEE23 | ML | CNN | ML | ||||||||||||||||||||||||||
37 | ScienceDirect1 | Nossier S. et. al. | Enhanced smart hearing aid using deep neural networks | SCIENCE DIRECT | Deep learning; Dropout; Noise of interest awareness; Smart hearing aid; Speech enhancement | 2019 | Smart hearing aid | DNN | Hearing aid | 89 | Car horn | NN | ML | |||||||||||||||||
38 | ScienceDirect2 | Abdoli et. al. | End-to-end environmental sound classification using a 1D convolutional neural network | SCIENCE DIRECT | Convolutional neural network Environmental sound classification Deep learning Gammatone filterbank | 2019 | Environemntal sound classification using 1D CNN | CNN | Environment | Sound | Feature extraction, MSE | Y | ML | CNN | ||||||||||||||||
39 | ScienceDirect3 | Mushtaq Z & Su S.F | Environmental sound classification using a regularized deep convolutional neural network with data augmentation | SCIENCE DIRECT | Data augmentation Environmental sound classification Regularization Deep convolutional neural network Urbansound8k | 2020 | Environmental Sound Classification | DCNN | Environment | 95.3 | CNN | Y | ML | CNN | ||||||||||||||||
40 | ScienceDirect4 | Chen Y. et. al. | Environmental sound classification with dilated convolutions | SCIENCE DIRECT | Sound information retrieval Environmental sound classification Dilated convolutions | 2018 | Sound signal retrieval | CNN | Sound retrieval | ReLU, Softmax value, cross entropy, CNN | Y | ML | CNN | |||||||||||||||||
41 | ScienceDirect6 | Demir F et. al. | A new pyramidal concatenated CNN approach for environmental sound classification | SCIENCE DIRECT | Sound classification Deep learning SVM STFT CNN | 2020 | Environment sound classification | Deep learning CNN | Short Time Frontier Transform | VCGNet 16, VCGNet 19, DenseNet 201 | Urbansound 8K, ESC - 10, ESC -50 | Environment | Sound | 94.8, 81.4, 78.1 | Short Time Fourier Transform (STFT) | Y | ML | |||||||||||||
42 | ScienceDirect7 | Mushtaq Z et. al. | Spectral images based environmental sound classification using CNN with meaningful data augmentation | SCIENCE DIRECT | Environmental sound classification Convolutional neural network Spectrogram Data augmentation Transfer learning | 2021 | pproach of spectral images based on environmental sound classification using Convolutional Neural Networks (CNN) with meaningful data augmentation | CNN | ESC -10, ESC -50, Urbansound 8K | Environment | Sound | 99.04, 99.49, 97.57 | CNN | Y | ML, Transfer Learning | |||||||||||||||
43 | ScienceDirect8 | Ahmad S et. al. | Environmental sound classification using optimum allocation sampling based empirical mode decomposition | SCIENCE DIRECT | Environmental sound classification Optimum allocation sampling Empirical mode decomposition Multi-class least squares support vector machine Extreme learning machine | 2020 | Automatic environmet sound classification | Optimum allocation sampling | ESC - 10 | Environment | Sound | 87.25, 77.61 | dog bark, rain, sea waves, baby cry, clock tick, person sneeze, helicopter, chainsaw, rooster, and fire crackling | Empirical Mode Decomposition , Feature extraction (Approximate Entropy, Permutation Entropy, Log-energy entropy, Zero Crossing Rate) SVM, NN | Y | ML | ||||||||||||||
44 | ScienceDirect9 | Mushtaq Z. & Su S. | Environmental sound classification using a regularized deep convolutional neural network with data augmentation | SCIENCE DIRECT | Data augmentation Environmental sound classification Regularization Deep convolutional neural network Urbansound8k ESC-10 ESC-50 | 2020 | ESC | CNN | Mel-spectrogram (Mel), Mel-Frequency Cep�stral Coefficient (MFCC) and Log-Mel by using DCNN | ESC-10 ESC-50 US8K | 94.9 89.2 95.3 | Y | Y | Y | ||||||||||||||||
45 | Springer1 | Medhat F. et. al. | Masked Conditional Neural Networks for Environmental Sound Classification | Springer | Conditional Neural Networks � CLNN � Masked Conditional Neural Networks � MCLNN � Restricted Boltzmann Machine, RBM � Conditional Restricted Boltzmann Machine � CRBM � Deep Belief Nets � Environmental Sound Recognition � ESR � YorNois | 2017 | Environmental sound classification | Conditional Neural Network | Urbansound 8K, YorNoise | Environment | 73 | air conditioner, car horns, children playing, dog bark, drilling, engine idling, gunshot, jackhammers, siren and street music | CNN, Feature extraction | Y | CNN | CNN | ||||||||||||||
46 | Springer2 | Zhang Z. et.al. | Deep Convolutional Neural Network with Mixup for Environmental Sound Classification | Springer | Environmental sound classification Convolutional neural network · Mixup | 2018 | ESC | CNN | ESC-10 dataset is a subset of 10 classes (400 samples), UrbanSound8K dataset is a collection of 8732 short (up to 4 s) audio clips of urban sound areas | Environment | 91.7 83.9 83.7 | dog bark, rain, sea waves, baby cry, clock tick, person sneeze, helicopter, chainsaw, rooster, fire crackling,air conditioner, car horn, children playing, dog bark, drilling, engine idling, gun shot, jackhammer, siren, and street music | we propose a novel CNN as our ESC system model inspired by VGG Net, . In order to achieve a better performance for our system on ESC, the effect of mixup hyper-parameter α is further explored. Figure 5 shows the change of accuracy with different α ranging from [0.1, 0.5]. We see that when α = 0.2, the best accuracy is achieved on all three datasets. | Generating training data | Y | ML | ||||||||||||||
47 | INTERSPEECH1 | Sailor B. et. al. | Unsupervised Filterbank Learning Using Convolutional Restricted Boltzmann Machine for Environmental Sound Classification | INTERSPEECH | Unsupervised Filterbank Learning, ConvRBM, Sound Classification, CNN | 2017 | Environmental sound classification | supervised Convolutional Neural Network (CNN) | ESC -50 dataset | Environment | Sound/Audio Signal | proposed ConvRBM-BANK outperform EnvNET [18] even without the system combination. this shows the significance of unsupervised generative training using ConvRBM | 78.45 | ConvRBM-BANK performs significantly better than CNN with FBEs | CNN | Y | Y | CNN | ||||||||||||
48 | INTERSPEECH2 | Sharma J. et. al. | Environment Sound Classification using Multiple Feature Channels and Attention based Deep Convolutional Neural Networ | INTERSPEECH | Convolutional Neural Networks, Attention, Multiple Feature Channels, Environment Sound Classification | 2020 | Environmental sound classification | CNN | Mel-Frequency Cepstral Coeffi�cients (MFCC), Gammatone Frequency Cepstral Coefficients (GFCC), Constant Q-transform (CQT) and Chromagram | ESC-10 ESC-50 US8K | Environment | 94.75(ESC-10) 87.45(ESC-50) 97.52(US8k) | We stop at 128 features, which pro�duces the best results, to avoid increasing the complexity of the model. | CNN | N | y | CNN | |||||||||||||
49 | eprint aRXIV | Mohaimenuzzaman Md. et. al. | Environmental Sound Classification on the Edge: A Pipeline for Deep Acoustic Networks on Extremely Resource-Constrained Devices | eprint aRXIV | Deep Learning, Audio Classification, Environmental Sound Classification, Acoustics, Intelligent Sound Recognition, Micro-Controller, IoT, Edge-AI, ESC-50 | 2021 | ESC | ACDNet, which produces above state-of-the-art accuracy on ESC-10 (96.65%) and ESC-50 (87.1%), we describe the com�pression pipeline and show that it allows us to achieve 97.22% size reduction and 97.28% | ACDNet is implemented in PyTorch version 1.7.1 and Wavio audio library is used to process the audio files. ESC-10 ESC-50 US8K | Environment | While limitations of the programming environment have restricted the accuracy of our current test deployment on a physical MCU, we have conclusively shown that 81.5% accuracy is achievable on such a resource-impoverished device, close to the state-of-the-art and above human performance. | 96.65 | it is likely that the performance can be improved by using quantization-aware training and pruning. Secondly, we would like to try the SpArSe approach for further optimisations now that we have developed Micro-ACDNet as a suitable starting point for its optimisations. | ACDNet, which produces above state-of-the-art accuracy on ESC-10 (96.65%) and ESC-50 (87.1%), we describe the com�pression pipeline and show that it allows us to achieve 97.22% size reduction and 97.28% | Training sample, Learning rate, prunning process | Y | CNN | Hybrid prunning | ||||||||||||
50 | MDPI1 | Dempter-Shafer Evidence Theory | ML | CNN | ||||||||||||||||||||||||||
51 | crossover fitting, user preference matching, populating diversity | ML | Presentation, application, storage | |||||||||||||||||||||||||||
52 | MDPI3 | Speech Enhancement, CNN | CNN | |||||||||||||||||||||||||||
53 | MDPI4 | soundscaping, source mixing and source, modeling, STFT, posterior distribution | soundscaping, source mixing and source modeling | |||||||||||||||||||||||||||
54 | MDPI5 | acoustic feedback signal, microphone, feedback and error signals, computational complexity | Hearing aid sructure/curcuit | |||||||||||||||||||||||||||
55 | MDPI6 | |||||||||||||||||||||||||||||
56 | MDPI7 | Sampling frequency, Speech Quality Perception Evaluation | ML, Feature extraction, Evaluation process | denoising autoencoder networks | ||||||||||||||||||||||||||
57 | ||||||||||||||||||||||||||||||
58 | ||||||||||||||||||||||||||||||
59 | ||||||||||||||||||||||||||||||
60 | ||||||||||||||||||||||||||||||
61 | ||||||||||||||||||||||||||||||
62 | ||||||||||||||||||||||||||||||
63 | ||||||||||||||||||||||||||||||
64 | ||||||||||||||||||||||||||||||
65 | ||||||||||||||||||||||||||||||
66 | ||||||||||||||||||||||||||||||
67 | ||||||||||||||||||||||||||||||
68 | ||||||||||||||||||||||||||||||
69 | ||||||||||||||||||||||||||||||
70 | ||||||||||||||||||||||||||||||
71 | ||||||||||||||||||||||||||||||
72 | ||||||||||||||||||||||||||||||
73 | ||||||||||||||||||||||||||||||
74 | ||||||||||||||||||||||||||||||
75 | ||||||||||||||||||||||||||||||
76 | ||||||||||||||||||||||||||||||
77 | ||||||||||||||||||||||||||||||
78 | ||||||||||||||||||||||||||||||
79 | ||||||||||||||||||||||||||||||
80 | ||||||||||||||||||||||||||||||
81 | ||||||||||||||||||||||||||||||
82 | ||||||||||||||||||||||||||||||
83 | ||||||||||||||||||||||||||||||
84 | ||||||||||||||||||||||||||||||
85 | ||||||||||||||||||||||||||||||
86 | ||||||||||||||||||||||||||||||
87 | ||||||||||||||||||||||||||||||
88 | ||||||||||||||||||||||||||||||
89 | ||||||||||||||||||||||||||||||
90 | ||||||||||||||||||||||||||||||
91 | ||||||||||||||||||||||||||||||
92 | ||||||||||||||||||||||||||||||
93 | ||||||||||||||||||||||||||||||
94 | ||||||||||||||||||||||||||||||
95 | ||||||||||||||||||||||||||||||
96 | ||||||||||||||||||||||||||||||
97 | ||||||||||||||||||||||||||||||
98 | ||||||||||||||||||||||||||||||
99 | ||||||||||||||||||||||||||||||
100 |