6/02/2017

Working on an Automated Testing Setup

Tina Z, June 2, 2016

My project for the first week of summer is to try replicating the results for the Current Events paper, which described being able to recognize what websites were accessed based on frequency analysis of a computer’s power line, using a machine learning SVM. Initially I hoped to use old data sets to train a rudimentary machine learning SVM using the open source library libsvm, but we couldn’t find the hard drive with the right data (or the data in the right hard drive.) Seeing as a few years of PhD research data may now be lost in the void, I decided to collect power traces for each website accessed by a Raspberry Pi, computing a FFT, and formatting data sets to be recognized by the machine learning library.

However, to properly train a machine learning algorithm I’d like to have a lot of data sets - preferably at least 50 websites accessed, and high resolution data if possible as well. Unfortunately, due to past experiences I’ve grown to really, really dislike manual testing and collecting data, not only because it introduces human error and setup changes but also it’s tedious and a soul-eroding process.

In order to mitigate all that concerning stuff, I made this following basic setup: Python code on the Raspberry Pi that opened a web browser for some URL, and also pull down a pin to trigger the Picoscope/oscilloscope, which would collect data for five seconds. The Picoscope is connected to my MacBook, and that five second data collection is also automated by a Python script (based on sample code donated by kind grad student Connor Bolton), and each log should be saved to a .bin file.

Things that worry me are:

To access the power line, there is a bit of a kerfuffle of long USB wires heading out to a breakout board, another USB breakout board, and another meter-long USB wires. This could introduce a lot of 60Hz (and probably other frequency) noise into my data. Since I need a power trace of voltage/current, I think I’ll have to open the power line some other way if the data is too noisy.
Getting the same results may end up requiring the same test setup if this isn’t done correctly, which shouldn’t have to be the case
I’m not sure how long a list can be in Python, I’m sampling at 1MHz and saving for 5 seconds. I’d probably need a 5Mb buffer assuming my data is byte-sized.

Python script on the Raspberry Pi: webtest.py

import RPi.GPIO as GPIO

import time

import subprocess

pin = 18

GPIO.setmode(GPIO.BCM)

GPIO.setup(18, GPIO.OUT)

file = open('testwebsites.txt', 'r')

for line in file:

print(line)

GPIO.output(pin, GPIO.LOW)

url = line.format("Raspberry Pi")

# start sampling at falling edge of trigger

p = subprocess.Popen(["chromium-browser", url])

time.sleep(3) # wait for browser to finish loading

GPIO.output(pin, GPIO.HIGH) # triggers on rising edge

time.sleep(6)

p.kill()

GPIO.cleanup()

file.close()

Python script for the Picoscope: trigger_data_collect.py

import math
import time
import inspect
import numpy as np
import sys
sys.path.insert(0, './../python_objects')
from pico_driver_5000a import ps5000a

from matplotlib.mlab import find

class trigMeasure():
def __init__(self):
self.ps = ps5000a(serialNumber=None)

def openScope(self):
self.ps.open()
self.data_count = 0

self.ps.setChannel("A", coupling="DC", VRange=5.0, probeAttenuation=10)
self.ps.setChannel("B", enabled=False)
self.ps.setChannel("C", enabled=False)
self.ps.setChannel("D", enabled=False)
res = self.ps.setSamplingFrequency(1E6, 5E6)
self.sampleRate = res[0]
self.numSamples = res[1]
print("Sampling @ %f MHz, %d samples"%(res[0]/1E6, res[1]))

#Use external trigger to mark when we sample
self.ps.setSimpleTrigger(trigSrc="External", threshold_V=0.150, timeout_ms=5000)

def closeScope(self):
self.ps.close()

def armMeasure(self):
self.ps.runBlock()

def measure(self):
print("Waiting for trigger")
while(self.ps.isReady() == False):
time.sleep(0.001)
print("Triggered")
self.data_count += 1
data = self.ps.getDataV("A", self.numSamples)
# print(data)
save(data)

def save(data):
file = open('data'+data_count+'.bin', '+w') # should clear previous contents
for i in range(len(data)):
entry = [1/self.sampleRate*i, data[i]]
file.write(struct.pack('5i', *entry)) # 32-bit data
file.close()

if __name__ == "__main__":
tm = trigMeasure()
tm.openScope()

try:
while 1:
tm.armMeasure()
tm.measure()

except KeyboardInterrupt:
pass

tm.closeScope()

^ The above is now part of the forked SPQR1 repo

If data collection works as intended, I really hope I can replicate/come close to the accuracy of past algorithms. I also want to see if it’s possible to make an algorithm recognize multiple pages at once. For instance, can we make new training data by adding their FFTs’ real and imaginary components, and then be able to recognize if multiple pages are open? Are there predictable frequency changes when we move from single-threaded to multi-threaded browser processes when people open multiple tabs? Due to me not really understanding how the kernel scheduler does things (which will probably be unpredictable for each computer and some browsers are multiple processes instead of multi-threaded), that probably wouldn’t work, but it could be interesting to test a bit.

Log from 6/2

Got raspberrypi and picoscope code to work. Managed to make connor’s code Mac-compatible and wrote new script to collect data above. Picoscope code kinda iffy, sometimes can’t connect

Collect data tomorrow