1 of 20

Machine Intelligence Term Project

Group 8 :

Rajat Bhavnani – 13EC10048

Ishan Jain – 13EC10024

M. Sujith Reddy – 13EC10034

Raunak Chitlangia – 13EC32010

Abhinav Sharma – 13EC10002

Robin Singh Sidhu - 13EC10052

2 of 20

Data Collection

Collect Mouse dynamics data
Collect Keystroke dynamics data
Collect GUI data using acquisition tool

Data Pre-processing

Used python to convert the initial data into a csv file containing the following columns – MM, MP, MR, MC etc
Using the csv file, we extracted 17 features for each data

Model Training

Model was trained on Naïve Bayes Classifier using 5 fold validation
It was also trained on some contemporary models to compare the effectiveness of the classifier

Model Testing

Data which was collected over a period of 4 weeks by our group and other groups was tested on our model and results were recorded.

Project Workflow

3 of 20

Introduction

User identification is normally done by secret phrases known as passwords.
It can also be done by using mouse or keyboard dynamics.
Users can be authenticated using their mouse operating styles.
In re-authentication system for practical, it must have the following features:
Accuracy
Quick response
Difficult to forge

4 of 20

Data Acquisition

Mouse data logger

Keyboard data logger

5 of 20

GUI data acquisition

6 of 20

Data Acquisition Statistics

Type of Data	Quantity
Mouse Dynamics	~24 log files
Keystroke Dynamics	~24 log files
GUI Data	180+ videos

7 of 20

Mouse Dynamics

cues and idiosyncrasies believed to be unique to the individual
measures and assesses user’s behavioral characteristic to use as biometric
needs no specialized hardware
easy to implement and comprehend

8 of 20

Mouse Data Acquisition

-continuous mouse data for user authentication

-mood dependent variability in mousing patterns.

Since authentication is of primal concern, experimental setup involves controlled environment with fixed hardware.

9 of 20

Pre-Processing and feature identification

The strategy behind preprocessing is to detect each point and every click action, where click action can be prescribed as mouse movements after every click.

Continuous mouse actions are the movements where series of mouse actions or movements with short or no pause between each adjacent step.

Within the ith point-and-click action for a user c, we can denote the jth mouse move record as mouse move, ti, xi, yi_c,j, where tide notes the timestamp of the ith mouse action or movement.

Based on the information that belongs to every point and every click movement, we can find angle-based metrics

10 of 20

Mouse Feature description

Statistical features:

single click stats
double click stats
offset from ideal mouse trajectory
elapsed time

Behavioral features:

speed variation with time
acceleration with time

11 of 20

Different Features

Direction
Angle of Curvature
Curvature Distance
Speed
Pause and click

These features are unique to every user and can easily be used to characterise the behaviour of a user.

12 of 20

Data Pre-processing

The data logs were cleaned manually because there were multiple log-ins due to which the mouse-clicks were not continuous in format.
Used MM, MR MP, MC… data to extract 17 features unique to person’s mouse movement data like scroll_count, scroll_time, left click frequency, right click frequency, speedX, speedY etc.
We found out that 7 of the 17 features were highly correlated to the target function. These features were used for training the Naïve Bayes Classifier model.
We modified the pre-processed data to change the target function to user+emotion instead of just user to predict the emotion of the user as well while authentication.

13 of 20

Model (Naïve bayes Classifier)

Abstractly, naive Bayes is a conditional probability model: given a problem instance to be classified, represented by a vector representing some n features (independent variables), it assigns to this instance probabilities

for each of K possible outcomes or classes Ck.

The problem with the above formulation is that if the number of features n is large or if a feature can take on a large number of values, then basing such a model on probability tables is infeasible. We therefore reformulate the model to make it more tractable. Using Bayes' theorem, the conditional probability can be decomposed as

14 of 20

The naive Bayes classifier combines this model with a decision rule. One common rule is to pick the hypothesis that is most probable; this is known as the maximum a posteriori or MAP decision rule.

A Bayes Classifier, is the function that assigns a class label y_cap = Ck for some k as follows:

15 of 20

Model Testing Statistics

Series 1 - Only User Name Authentication

Series 2 - Only Mood Analysis

16 of 20

User authentication and mood analysis using 5 fold validation.

Series 1(blue) = User authentication

Series 2(orange) = Mood analysis

17 of 20

Discussion

We first attributed the non-attainment of the desired accuracy to the flaws in the data pre-processing model. But, after training other models like binary classification,decision trees and KNN. with the pre-processed data, we encountered promising results for user authentication and emotion analysis. So we finally came to the conclusion that the low accuracy was actually due to the limitation of the Naïve Bayes classifier. Here are some of the reasons for the same –

Dependency of Features – We assume that the features used by the classifier are independent of each other. But since we know that we are deriving the features from the mouse log data, they tend to be correlated.

18 of 20

Data Scarcity – For any possible value of a feature, we need to estimate a likelihood value by frequentist approach. This can result in probabilities going towards 0 or 1 which in turn leads to numerical instabilities and worse results.