Sound Recognition with Edge Impulse

Last modified by Microchip on 2024/06/24 06:29

Contents

Objective
Final Remarks

Objective

Figure 1: Firmware recognizing vacuum cleaner audio activity

This tutorial will guide you through the process of building a vacuum cleaner sound recognizer with Edge Impulse and deploying it to the Microchip Curiosity Ultra development board. This post includes the data, trained neural network model, and deployment code so you can get up and running quickly, but it will also explain the development process step by step so that you may learn how to develop your own sound recognizer.

The steps in this guide are summarized by the points below:

Set up the SAME54 Curiosity Ultra board plus WM8904 daughterboard
Review the operation of the pre-built sound classifier firmware
Set up the custom processing block server for LogMFE feature extraction
Clone and review the vacuum-recognition-demo Edge Impulse project
Modify the Edge Impulse deployment code to support LogMFE features

Materials

Hardware Tools

Microchip Technology SAM E54 Curiosity Ultra Development Board

Curiosity ultra

WM8904 Audio Codec Module

audio daughterboard

Analog microphone with 3.5mm connector

omnimic+3.5mm

Software Tools

MPLAB® X IDE
MPLAB® IPE
Edge Impulse Studio (Free account; No download required)

Exercise Files

The firmware and MPLAB X IDE project files can be found in the GitHub repository.
The dataset used in this tutorial can be downloaded from the latest GitHub release.
Pre-built firmware for the vacuum sound recognizer can be downloaded from the latest GitHub release.
The vacuum-recognition demo project used in this guide is available as an Edge Impulse project.

Procedure

Before we get started, you'll need to install and set up the required software as detailed in the steps below.

Install the MPLAB IPE tool if you just want to flash the included HEX file. If you plan on compiling your own code, skip this step and use MPLAB X IDE to program the board.

Install the MPLAB X IDE and XC32 compiler. These are required to load the keyword spotting project and to program the SAME54 board. You can use the default, free license for the XC32 compiler as we won't need any of the pro functionality here.

Create a free account with Edge Impulse if you haven’t already. We'll use this to process our sensor data and generate the keyword spotting classifier library. The Edge Impulse Studio is an entirely web-based UI so no need to download anything locally.

Finally, head over to GitHub and download the latest release of this project, which includes the firmware and custom processing block code required for this tutorial.

Configuring the Hardware

To enable audio collection for the SAME54, we first need to install the WM8904 daughterboard and configure the board’s jumpers appropriately. We’ll use Figure 2 (taken from the user guide) as a map for the different components on the board.

Figure 2: Map of the SAME54 Curiosity Ultra Development board.

Connect the WM8904 daughterboard to the X32 audio interface (labeled 5 on Figure 2) making sure to orient the board so that the 3.5 mm audio connectors face the edge of the board.

Set the jumpers on the audio codec board to match Figure 3; each jumper should be connected to the two left-most pins on the board when viewed from the perspective of the audio jacks.

Set the CLK SELECT jumper (labeled 10 in Figure 2) so that it connects the MCLK and PA17 pins on the board as shown in Figure 4. This pin configuration lets the WM8904 act as the clock master to the SAME54’s I2S peripheral.

Figure 4: Clock select jumper configured for WM8904 as master

Connect your microphone to the audio daughterboard’s MIC IN jack.

Connect your PC’s USB connection to the Curiosity board’s EDBG micro USB connector (labeled 2 in Figure 2).

Great! The hardware is now configured for the demo project. If you have MPLAB X IDE open, the Curiosity board should be automatically detected by MPLAB X IDE.

For more detailed information about the Curiosity Ultra board including schematics, consult the user guide.

Sound Recognition Firmware Overview

Before jumping into the steps to develop a sound recognizer from scratch, let's quickly cover the pre-compiled firmware for vacuum cleaner detection accompanying this post. Go ahead and program your device with the firmware HEX file from the latest release using the MPLAB IPE tool before moving ahead.

With the firmware loaded, try turning on a nearby vacuum cleaner; after a short delay, the firmware will strobe the onboard LED1 located at the top left of the development board near the barrel connector; see Figure 5 for reference.

status LEDs — Figure 5: Sound Recognition Status LEDs

In addition, the firmware also prints the confidence numbers for classification over the UART port. To read the UART port use a terminal emulator of your choice (e.g., PuTTY for Windows) with the following settings:

Baudrate: 115200
Data bits: 8
Stop bits: 1
Parity: None

For reference, an example of the output is shown in Figure 6. Notice that the confidence numbers are between 0 (no confidence) and 1 (certainty) and the class confidences roughly sum up to 1.

Figure 6: Terminal output from the sound recognition firmware

That covers the operation of the firmware, let's move on to the steps needed to reproduce this firmware from scratch.

Data Collection

As always for machine learning projects, the first thing we need is data. A dataset for vacuum cleaner detection has already been compiled for this project based on publicly available datasets (namely MS-SNSD and DEMAND). The dataset was compiled for the application of detecting a vacuum cleaner in a domestic environment that would be robust to common domestic noise; it includes several scenarios of a vacuum cleaner running indoors and a mix of different types of background noise that includes speech, air conditioner, and a mix of common domestic acoustic activity such as dishwashing, laundry, and music playback. The vacuum cleaner data is included with the vacuum-recognition-demo Edge Impulse project (covered later on), but can also be downloaded separately from the GitHub repository.

If you plan on collecting a dataset for your own application, make sure to introduce enough variation into your data so that your final model will generalize well to different unseen scenarios. You'll also want to make sure to collect enough data for each audio class; a good starting point is 5-10 minutes per class, but it will depend on the audio class and the quality of the data collected; for example, if your data is inherently noisy (i.e., containing a lot of non-salient information) more data may be required to learn the acoustic activity of interest.

Custom Features with Edge Impulse

The project accompanying this guide has been developed and optimized to use a feature type that is not built-in to Edge Impulse Studio, so before we can move ahead you'll need to set up your own custom processing block server for the feature extraction step.

Custom processing blocks are an Edge Impulse feature that lets users plug in their own feature extraction blocks generically via an HTTP interface. This functionality can be used to add support for additional feature types, allow more advanced feature reconfigurability, and even allow for customized data visualizations inside Edge Impulse Studio. Here we use this functionality to add the LogMFE feature type to Edge Impulse Studio.

If you'd prefer to skip these extra steps, you can try using the built-in MFCC or MFE feature blocks instead; however, your end application performance may differ significantly from the results published here.

Log Mel-Frequency Energy Features

For this project, we use the logarithm of the Mel-frequency energy (LogMFE) - a feature set that is widely used in machine learning for audio tasks, especially in cases where speech content is not the primary interest. A visualization of the two feature types for one of the dataset samples is shown in Figure 7; the figure illustrates the LogMFE feature's relatively increased sensitivity to the vacuum cleaner activity; in particular, the sustained tonal content (i.e., the horizontal lines in the plot) produced by the vacuum is more easily distinguishable in the LogMFE spectrogram compared to the MFCC features.

Figure 7: MFCC and LogMFE spectrogram features from the same 2-second audio window.

Besides the suggestive visual evidence, the LogMFE developed neural network displayed improved performance over the MFCC and MFE variants of this project - at least for the particular dataset and configuration parameters explored - hence why LogMFE was selected.

LogMFE Computation

The following pseudo-code summarizes the LogMFE feature extraction process:

# x <- segment of input time series signal (one 'frame')
# w <- frequency analysis window
# H_mel <- the Mel filterbank matrix

# Window multiply and apply Real Fast Fourier Transform (RFFT)
X = rfft(x * w)

# Compute the normalized power spectrum
X_pow = 1 / N_FFT * abs(X)^2

# Apply filterbank to get Mel-frequency energy bins
X_mfe = X_pow x H_mel

# Apply the Log function to get the final LogMFE
X_logmfe = log(X_mfe)

Pseudo-code for the Log Mel-frequency transform

Setting Up the Custom LogMFE Feature Block Server

To generate the LogMFE features for Edge Impulse, an HTTP server that accepts the raw audio data and sends back the processed data must be set up. This server URL can then be inserted into our Impulse as we will cover in the next section.

The code that implements this part of the project is included in the GitHub repository under the custom-processing-blocks/logmfe folder, so if you haven't downloaded the source code already, clone the repository code using the following command:

git clone https://github.com/MicrochipTech/ml-same54-cult-wm8904-edgeimpulse-sed-demo

Follow Section 1 of the Edge Impulse custom processing block tutorial using the custom-processing-blocks/logmfe code from the repository in place of the example code referenced in the guide. The Edge Impulse guide will cover the steps needed to bring up the HTTP server and expose access to it with a public URL.

Impulse Creation

With the LogMFE feature server set up, we can now define and train our Impulse.

If you haven't already, start your LogMFE custom processing block HTTP server and take note of the generated URL.

Once the vacuum-recognition-demo project has finished copying, navigate to the Create Impulse tab to set the overall Impulse configuration:

Click the Edit button on the LogMFE audio block (shown as a pencil icon). In the resulting pop-up dialog, enter the public URL for your HTTP server that was generated previously, then click Confirm URL.

In the Time series data block, the window size should be configured to 2048 ms and the window increased to 512 ms. Note that the power of 2 window values are a result of choosing parameters that line up with the LogMFE feature extraction parameters (defined in the next step) to avoid any truncation or padding of the input window.
Window size affects how much context is given to the classifier per input; choosing too small a window can make it difficult for the algorithm to differentiate sounds well whereas too large a window will incur a large latency penalty.
Window increase determines the amount of overlap between classifier inputs; overlapping inputs can serve to both augment the available data and build some time invariance into the learning process; although, there is little benefit to going beyond 75% overlap.

The final impulse configuration should match Figure 9. Click Save Impulse to save updates.

Navigate to the LogMFE tab to set up the LogMFE feature configuration. Note if you choose to use the built-in MFCC or MFE features instead you can use a similar configuration as shown here, but there are additional parameters you may need to choose that won't be covered in this guide. Configure the LogMFE parameters according to Figure 10, then switch to the Generate Features tab and click the Generate Features button.

These parameters were chosen to minimize the RAM and processing cost of the feature extraction process while still maintaining good model performance; they may not work well for all applications.
A Frame Length of 20-25 ms is common for general audio classification tasks, but since the temporal resolution is not so important for this application it's preferable to use a frame length that matches the FFT Length of 32 ms (512 samples @ 16 kHz) for computational efficiency.

Navigate to the NN Classifier tab and configure the neural network as shown in Figure 11. Click Start Training and let the training run to completion.
Neural network configuration
The use of convolutional layers helps keep the number of parameters in the network small due to the re-use of parameters. The use of 1-D convolutions (convolution over time) also helps to minimize the model's RAM space and processing requirements.

Navigate to the Model testing tab to check neural network performance. Click the Classify All button to evaluate model performance on the test data set. Figure 12 shows the result for the vacuum cleaner test dataset.

Model accuracy on test data set — Figure 12: Model accuracy on the test data set

At this point, the model is trained and tested and we can move on to deploying to our hardware.

Deploying Your Impulse

Follow the steps below to deploy your Impulse and integrate it into your existing MPLAB X IDE project.

Use the MPLAB X IDE project that accompanies this guide as a starting point for your own project. This will save you the trouble of doing the hardware and project configuration yourself.

Switch to the Deployment tab in Edge Impulse Studio and select the C/C++ deployment option (C++ library).

Figure 13: Deploy the C++ Edge Impulse Inferencing SDK

Click the Build button that appears at the bottom of the page, leaving the default options selected.

Unzip the contents of the Edge Impulse ZIP file into your MPLAB X IDE project’s src/ folder so that they overwrite the original Edge Impulse SDK files.

Rename all the CC files from the Edge Impulse library to have a CPP suffix. You can do this in one shot with the following commands:

On Windows: ren *.cc *.cpp

On Mac^®/Linux: find . -name "*.cc" -exec sh -c 'mv "$1" "${1%.cc}.cpp"' _ {} \;

Adding LogMFE to the Edge Impulse Inferencing SDK

At this point, the Edge Impulse Inferencing SDK should be fully integrated into your project. However, we still need to add support for the custom LogMFE audio feature to the deployed SDK. Luckily, we can implement this with minimal modifications by directly modifying the MFE feature code as detailed in the steps below.

If you have doubts about any of the steps below, take a look at the firmware source code accompanying this post where these changes have already been implemented.

Using the tool of your choice, generate a square-root Hann window that matches the Frame Length parameter from the LogMFE block. This is the windowing function that will be applied to your input signal before the Fourier transform. The following code snippet generates the window using Python and the NumPy library.

The square-root Hann window (AKA Hanning window), a common window for audio applications, is used for this project. Note that it's possible to use other window types, but the window must match the custom LogMFE feature block code used in the model development step.

import numpy as np

# Sampling frequency
Fs = 16000

# Frame length (must match parameter from Log MFE block)
Frame_length = 0.032 

L = int(Fs * Frame_length)
print(np.sqrt(np.hanning(L+1))[1:])

Generating the square root hann window with NumPy

Open src/edge-impulse-sdk/dsp/speechpy/feature.hpp

At the beginning of the speechpy namespace near the top of the file, define a new array named window of type float; initialize it with the coefficients of the window generated in the previous step.

Figure 14: Define the windowing coefficients

Locate the mfe() function. Inside the for loop, apply the window multiply after the call to signal->get_data() and before the call to processing::power_spectrum() as shown in Figure 15.

Apply window function to the input data — Figure 15: Apply the window function to the input data

At the end of the mfe() function before the return statement, apply a logarithm to the MFE output features as shown in Figure 16.

Apply Log function to the MFE features — Figure 16: Apply the Log function to the MFE features

Open src/edge-impulse-sdk/classifier/ei_run_classifier.h and locate the calc_cepstral_mean_and_var_normalization_mfe() function. Comment out the line calling cmvnw() and add a line calling the numpy::normalize() function as shown in Figure 17. This will disable the mean subtraction step, while still applying the minmax normalization.

Figure 17: Disable mean subtraction and apply minmax normalization

Open src/model-parameters/model_metadata.h.

Near the bottom of the file, you'll find duplicate definitions for ei_dsp_config_mfe_t. Delete the definition that does not include the win_size parameter.

Also near the bottom, find the instantiation of ei_dsp_config_mfe_t and add a comma and a 0 to the end of the initializer as shown in Figure 18; this will set the win_size to 0.

Figure 18: Modifying the initialization of the LogMFE configuration struct

Okay, the LogMFE feature should now be integrated into your source code and you should be ready to compile. Go ahead and click the Make and Program Device button in the toolbar ( icon) to compile and flash your firmware to the SAME54 MCU.

Final Remarks

That's it! You should now have a basic understanding of developing a sound recognition application with Edge Impulse and Microchip hardware.

For more details about integrating your Impulse with an existing MPLAB X IDE project, check out our "Integrating the Edge Impulse Inferencing SDK" article.

To learn more about Edge Impulse Studio, including tutorials for other machine learning applications, go to the Edge Impulse Docs Getting Started page.