Overcoming the Challenges of Adding Machine Learning to Your Products

Organizing the Data for Machine Learning

Last modified by Microchip on 2025/01/28 15:42

Organizing your data involves categorizing events in the data and describing the events with metadata.

Categorize Events

To organize your data effectively, you need to label events based on what has been detected and where in the data it has been detected.

What has been detected? Determine what type of event has been detected. Security cameras may need to identify a bicycle, a person, or a bus. If you are monitoring a fan and detect an event, what is that event? Is it because the fan has been knocked over? Has it been blocked with some paper or another object? Is a bearing starting to fail? Or is it just operating normally?

Categorizing data isn't just about labeling events; you also need to label where in the data the events occur. Is there a time dependence in your data? Does it matter if an event occurs before or after another event? One event may trigger another event.

A machine learning model has detected some things in this security camera image. It's interesting to note that the bus has been detected as a car because that's closest to what the model has been trained on. If you are designing a machine learning model to warn people not to walk in traffic, does it matter if the bus is labeled as a car?

Create Metadata

In addition to labeling the data, you need to create metadata for the data (i.e., data about the data). This metadata provides all the relevant contextual information relating to the events. Referring to the security camera example, there are labels around the cars, buses, and people. The metadata associated with these events might include the location of the bounding box in the image and the time of day the image was taken. Is it sunny or cloudy? This information might affect the lighting, which in turn might influence how the model learns.

Example: Monitoring a Fan

Figure 2 illustrates a fan monitoring demonstration. The fan is monitored with a three-axis accelerometer, a microphone positioned upstream from the fan, and a differential pressure sensor.

Categorizing Events

You can observe the data collected from this setup. Each event in the data represents a tap, which was actually done by tapping on the fan with a screwdriver. The signal itself has quiet periods, and each event has specific characteristics. There are frequency events, variations in amplitude, and changes in the spectrum and duration of the signal.

signals from fan monitoring demo — Figure 3

Creating Metadata

There's also metadata associated with this fan tunnel. You have different fans that can be put into this rig, some with bearings and some without. You have 5V and 12V fans. The type of pressure sensor, specifically the differential pressure sensor, is a small micro Click board™ with a few different variants, each having its own response time. Additionally, there's information about the environment. You have a very specific test rig with a certain tube length. You may notice at the far end there's a blockage with holes in it, which can be removed with a couple of screws. This can vary the actual installation and might affect the data. All of this is metadata associated with the sample.

Organizing the Data for Machine Learning

Categorize Events

Create Metadata

Example: Monitoring a Fan

Categorizing Events

Creating Metadata

Menu

On This Page

Learning Path

Microchip Support