Published using Google Docs
6/09/2017
Updated automatically every 5 minutes

Picking features and implementing LibSVM

Tina Z, June 9, 2016

Last week I wrote a few scripts to automate collecting data on a Raspberry Pi, and now that I finished collecting data I can finally implement the machine learning algorithm that was mentioned in the previous week’s blog post.

Without going too much into the details, in machine learning a support vector machine can train itself to classify data, mapping the data sets to the outputs, by first analyzing a training set of data. Then, it can be used to operate on real word examples.

 

Fig 1: concept of a multi-class SVM, from Andrew Ng’s Stanford open-source machine learning course

In the above, we have to differentiate between four different kinds of the objects, which are our output variables. If the output features are in the order [cross triangle circle square], the output of our machine learning algorithm will be in the format of a vector like [0 1 0 0], or that the given data indicate that it belongs to the triangle class. The commonly used LibSVM library, which could be used as a Matlab add-on, can do this super easily for us. We first have to pick a kernel, which “draws” the lines dividing the classes in some certain way (ex. linear kernel means straight lines, etc. for logarithmic kernel) By giving it a set of data and some parameters, we’ll be good to go.

Like last week, I’ve written some code that might be an okay starting point for this. I have 20 data sets from last week sampled at 500kHz for 1 second after page loading (by accident, I wanted to sample for 5 seconds, I may have to go back and take more data). This is on DC current by the way, because the Picoscope measured across a sensing resistor, so all the DC voltage values were divided by its resistance which was 1.4 ohms.

 

Things that are again worrisome:

Matlab code saved as LibsvmTest.m

data = []; %

m = 10; % Number of training sets

Fs = 0.5*10^6; % sampling frequency

data_secs = 1;

 

% -- examples

variables = ['var1' 'var2' 'var3'];

outputs = ['yahoo' 'google' 'bing' 'unknown'];

% training_label_vector = [1 0 0 0; 0 1 0 0; 0 0 1 0]; % outputs

training_label_vector = eye(m); % outputs

training_instance_matrix = [];

% training_instance_matrix = [12 34 56; 0 1 2; 78 90 12]; % data

 

for i = 1:m

    fileID = fopen(['data' num2str(i) '.bin']);

    data = [data; (fread(fileID, 'float'))'];

end

 

% ?? to variables

% fft 1 second window with 50% overlap

% 1 to 2, then 1.5 to 2.5, then 2 to 3 (500kS each)

% [FFT from 0.1-0.3 FFT from 0.2-0.4 ... FFT from 0.9-1 ]

FFT_N = 0.10*Fs;

for k=0:0.10:data_secs-0.10

    beg = (Fs*k)+1

    en = Fs*k+FFT_N % should window and zero pad data. unf, no signal processing toolbox

    training_instance_matrix = [training_instance_matrix abs(fft(data(:, beg:en), FFT_N, 2)/FFT_N)];

end

 

model = svmtrain(training_label_vector, training_instance_matrix, '-t 0');

The code is pretty self explanatory: I read a few binary files’ worth of data, run an FFT on it, then accumulate it in a matrix. Seeing as I’m reading DC current not AC, I probably had no reason to run the FFT so I’ll probably get rid of that and use some statistics instead as my variables (mean power, etc.) The above is now part of the forked SPQR1 repo on Github.