Sound Recognition with Edge Impulse
Objective
This tutorial will guide you through the process of building a vacuum cleaner sound recognizer with Edge Impulse and deploying it to the Microchip Curiosity Ultra development board. This post includes the data, trained neural network model, and deployment code so you can get up and running quickly, but it will also explain the development process step by step so that you may learn how to develop your own sound recognizer.
The steps in this guide are summarized by the points below:
- Set up the SAME54 Curiosity Ultra board plus WM8904 daughterboard
- Review the operation of the pre-built sound classifier firmware
- Set up the custom processing block server for LogMFE feature extraction
- Clone and review the vacuum-recognition-demo Edge Impulse project
- Modify the Edge Impulse deployment code to support LogMFE features
Materials
Hardware Tools
- Analog microphone with 3.5mm connector
Software Tools
- MPLAB® X IDE
- MPLAB® IPE
- Edge Impulse Studio (Free account; No download required)
Exercise Files
- The firmware and MPLAB X IDE project files can be found in the GitHub repository.
- The dataset used in this tutorial can be downloaded from the latest GitHub release.
- Pre-built firmware for the vacuum sound recognizer can be downloaded from the latest GitHub release.
- The vacuum-recognition demo project used in this guide is available as an Edge Impulse project.
Procedure
Before we get started, you'll need to install and set up the required software as detailed in the steps below.
Configuring the Hardware
To enable audio collection for the SAME54, we first need to install the WM8904 daughterboard and configure the board’s jumpers appropriately. We’ll use Figure 2 (taken from the user guide) as a map for the different components on the board.
Set the jumpers on the audio codec board to match Figure 3; each jumper should be connected to the two left-most pins on the board when viewed from the perspective of the audio jacks.
Set the CLK SELECT jumper (labeled 10 in Figure 2) so that it connects the MCLK and PA17 pins on the board as shown in Figure 4. This pin configuration lets the WM8904 act as the clock master to the SAME54’s I2S peripheral.
Great! The hardware is now configured for the demo project. If you have MPLAB X IDE open, the Curiosity board should be automatically detected by MPLAB X IDE.
For more detailed information about the Curiosity Ultra board including schematics, consult the user guide.
Sound Recognition Firmware Overview
Before jumping into the steps to develop a sound recognizer from scratch, let's quickly cover the pre-compiled firmware for vacuum cleaner detection accompanying this post. Go ahead and program your device with the firmware HEX file from the latest release using the MPLAB IPE tool before moving ahead.
With the firmware loaded, try turning on a nearby vacuum cleaner; after a short delay, the firmware will strobe the onboard LED1 located at the top left of the development board near the barrel connector; see Figure 5 for reference.
In addition, the firmware also prints the confidence numbers for classification over the UART port. To read the UART port use a terminal emulator of your choice (e.g., PuTTY for Windows) with the following settings:
- Baudrate: 115200
- Data bits: 8
- Stop bits: 1
- Parity: None
For reference, an example of the output is shown in Figure 6. Notice that the confidence numbers are between 0 (no confidence) and 1 (certainty) and the class confidences roughly sum up to 1.
That covers the operation of the firmware, let's move on to the steps needed to reproduce this firmware from scratch.
Data Collection
As always for machine learning projects, the first thing we need is data. A dataset for vacuum cleaner detection has already been compiled for this project based on publicly available datasets (namely MS-SNSD and DEMAND). The dataset was compiled for the application of detecting a vacuum cleaner in a domestic environment that would be robust to common domestic noise; it includes several scenarios of a vacuum cleaner running indoors and a mix of different types of background noise that includes speech, air conditioner, and a mix of common domestic acoustic activity such as dishwashing, laundry, and music playback. The vacuum cleaner data is included with the vacuum-recognition-demo Edge Impulse project (covered later on), but can also be downloaded separately from the GitHub repository.
If you plan on collecting a dataset for your own application, make sure to introduce enough variation into your data so that your final model will generalize well to different unseen scenarios. You'll also want to make sure to collect enough data for each audio class; a good starting point is 5-10 minutes per class, but it will depend on the audio class and the quality of the data collected; for example, if your data is inherently noisy (i.e., containing a lot of non-salient information) more data may be required to learn the acoustic activity of interest.
Custom Features with Edge Impulse
The project accompanying this guide has been developed and optimized to use a feature type that is not built-in to Edge Impulse Studio, so before we can move ahead you'll need to set up your own custom processing block server for the feature extraction step.
Custom processing blocks are an Edge Impulse feature that lets users plug in their own feature extraction blocks generically via an HTTP interface. This functionality can be used to add support for additional feature types, allow more advanced feature reconfigurability, and even allow for customized data visualizations inside Edge Impulse Studio. Here we use this functionality to add the LogMFE feature type to Edge Impulse Studio.
If you'd prefer to skip these extra steps, you can try using the built-in MFCC or MFE feature blocks instead; however, your end application performance may differ significantly from the results published here.
Log Mel-Frequency Energy Features
For this project, we use the logarithm of the Mel-frequency energy (LogMFE) - a feature set that is widely used in machine learning for audio tasks, especially in cases where speech content is not the primary interest. A visualization of the two feature types for one of the dataset samples is shown in Figure 7; the figure illustrates the LogMFE feature's relatively increased sensitivity to the vacuum cleaner activity; in particular, the sustained tonal content (i.e., the horizontal lines in the plot) produced by the vacuum is more easily distinguishable in the LogMFE spectrogram compared to the MFCC features.
Besides the suggestive visual evidence, the LogMFE developed neural network displayed improved performance over the MFCC and MFE variants of this project - at least for the particular dataset and configuration parameters explored - hence why LogMFE was selected.
LogMFE Computation
The following pseudo-code summarizes the LogMFE feature extraction process:
# w <- frequency analysis window
# H_mel <- the Mel filterbank matrix
# Window multiply and apply Real Fast Fourier Transform (RFFT)
X = rfft(x * w)
# Compute the normalized power spectrum
X_pow = 1 / N_FFT * abs(X)^2
# Apply filterbank to get Mel-frequency energy bins
X_mfe = X_pow x H_mel
# Apply the Log function to get the final LogMFE
X_logmfe = log(X_mfe)
Pseudo-code for the Log Mel-frequency transform
Setting Up the Custom LogMFE Feature Block Server
To generate the LogMFE features for Edge Impulse, an HTTP server that accepts the raw audio data and sends back the processed data must be set up. This server URL can then be inserted into our Impulse as we will cover in the next section.
git clone https://github.com/MicrochipTech/ml-same54-cult-wm8904-edgeimpulse-sed-demo
Impulse Creation
With the LogMFE feature server set up, we can now define and train our Impulse.
Once the vacuum-recognition-demo project has finished copying, navigate to the Create Impulse tab to set the overall Impulse configuration:
Click the Edit button on the LogMFE audio block (shown as a pencil icon). In the resulting pop-up dialog, enter the public URL for your HTTP server that was generated previously, then click Confirm URL.
Window size affects how much context is given to the classifier per input; choosing too small a window can make it difficult for the algorithm to differentiate sounds well whereas too large a window will incur a large latency penalty.
Window increase determines the amount of overlap between classifier inputs; overlapping inputs can serve to both augment the available data and build some time invariance into the learning process; although, there is little benefit to going beyond 75% overlap.
Navigate to the LogMFE tab to set up the LogMFE feature configuration. Note if you choose to use the built-in MFCC or MFE features instead you can use a similar configuration as shown here, but there are additional parameters you may need to choose that won't be covered in this guide. Configure the LogMFE parameters according to Figure 10, then switch to the Generate Features tab and click the Generate Features button.
These parameters were chosen to minimize the RAM and processing cost of the feature extraction process while still maintaining good model performance; they may not work well for all applications.
A Frame Length of 20-25 ms is common for general audio classification tasks, but since the temporal resolution is not so important for this application it's preferable to use a frame length that matches the FFT Length of 32 ms (512 samples @ 16 kHz) for computational efficiency.
The use of convolutional layers helps keep the number of parameters in the network small due to the re-use of parameters. The use of 1-D convolutions (convolution over time) also helps to minimize the model's RAM space and processing requirements.
Navigate to the Model testing tab to check neural network performance. Click the Classify All button to evaluate model performance on the test data set. Figure 12 shows the result for the vacuum cleaner test dataset.
At this point, the model is trained and tested and we can move on to deploying to our hardware.
Deploying Your Impulse
Follow the steps below to deploy your Impulse and integrate it into your existing MPLAB X IDE project.
Use the MPLAB X IDE project that accompanies this guide as a starting point for your own project. This will save you the trouble of doing the hardware and project configuration yourself.
Switch to the Deployment tab in Edge Impulse Studio and select the C/C++ deployment option (C++ library).
Rename all the CC files from the Edge Impulse library to have a CPP suffix. You can do this in one shot with the following commands:
Adding LogMFE to the Edge Impulse Inferencing SDK
At this point, the Edge Impulse Inferencing SDK should be fully integrated into your project. However, we still need to add support for the custom LogMFE audio feature to the deployed SDK. Luckily, we can implement this with minimal modifications by directly modifying the MFE feature code as detailed in the steps below.
If you have doubts about any of the steps below, take a look at the firmware source code accompanying this post where these changes have already been implemented.
Using the tool of your choice, generate a square-root Hann window that matches the Frame Length parameter from the LogMFE block. This is the windowing function that will be applied to your input signal before the Fourier transform. The following code snippet generates the window using Python and the NumPy library.
The square-root Hann window (AKA Hanning window), a common window for audio applications, is used for this project. Note that it's possible to use other window types, but the window must match the custom LogMFE feature block code used in the model development step.
import numpy as np # Sampling frequency Fs = 16000 # Frame length (must match parameter from Log MFE block) Frame_length = 0.032 L = int(Fs * Frame_length) print(np.sqrt(np.hanning(L+1))[1:])
Generating the square root hann window with NumPy
Open src/edge-impulse-sdk/dsp/speechpy/feature.hpp
At the beginning of the speechpy namespace near the top of the file, define a new array named window of type float; initialize it with the coefficients of the window generated in the previous step.
Locate the mfe() function. Inside the for loop, apply the window multiply after the call to signal->get_data() and before the call to processing::power_spectrum() as shown in Figure 15.
At the end of the mfe() function before the return statement, apply a logarithm to the MFE output features as shown in Figure 16.
Open src/edge-impulse-sdk/classifier/ei_run_classifier.h and locate the calc_cepstral_mean_and_var_normalization_mfe() function. Comment out the line calling cmvnw() and add a line calling the numpy::normalize() function as shown in Figure 17. This will disable the mean subtraction step, while still applying the minmax normalization.
Open src/model-parameters/model_metadata.h.
Also near the bottom, find the instantiation of ei_dsp_config_mfe_t and add a comma and a 0 to the end of the initializer as shown in Figure 18; this will set the win_size to 0.
Okay, the LogMFE feature should now be integrated into your source code and you should be ready to compile. Go ahead and click the Make and Program Device button in the toolbar ( icon) to compile and flash your firmware to the SAME54 MCU.
Final Remarks
That's it! You should now have a basic understanding of developing a sound recognition application with Edge Impulse and Microchip hardware.
For more details about integrating your Impulse with an existing MPLAB X IDE project, check out our "Integrating the Edge Impulse Inferencing SDK" article.
To learn more about Edge Impulse Studio, including tutorials for other machine learning applications, go to the Edge Impulse Docs Getting Started page.