1 of 20

Machine Intelligence Term Project

Group 8 :

Rajat Bhavnani – 13EC10048

Ishan Jain – 13EC10024

M. Sujith Reddy – 13EC10034

Raunak Chitlangia – 13EC32010

Abhinav Sharma – 13EC10002

Robin Singh Sidhu - 13EC10052

2 of 20

Data Collection

    • Collect Mouse dynamics data
    • Collect Keystroke dynamics data
    • Collect GUI data using acquisition tool

Data Pre-processing

    • Used python to convert the initial data into a csv file containing the following columns – MM, MP, MR, MC etc
    • Using the csv file, we extracted 17 features for each data

Model Training

    • Model was trained on Naïve Bayes Classifier using 5 fold validation
    • It was also trained on some contemporary models to compare the effectiveness of the classifier

Model Testing

    • Data which was collected over a period of 4 weeks by our group and other groups was tested on our model and results were recorded.

Project Workflow

3 of 20

Introduction

  • User identification is normally done by secret phrases known as passwords.
  • It can also be done by using mouse or keyboard dynamics.
  • Users can be authenticated using their mouse operating styles.
  • In re-authentication system for practical, it must have the following features:
  • Accuracy
  • Quick response
  • Difficult to forge

4 of 20

Data Acquisition

Mouse data logger

Keyboard data logger

5 of 20

GUI data acquisition

6 of 20

Data Acquisition Statistics

Type of Data

Quantity

Mouse Dynamics

~24 log files

Keystroke Dynamics

~24 log files

GUI Data

180+ videos

7 of 20

Mouse Dynamics

  • cues and idiosyncrasies believed to be unique to the individual
  • measures and assesses user’s behavioral characteristic to use as biometric
  • needs no specialized hardware
  • easy to implement and comprehend

8 of 20

Mouse Data Acquisition

-continuous mouse data for user authentication

-mood dependent variability in mousing patterns.

Since authentication is of primal concern, experimental setup involves controlled environment with fixed hardware.

9 of 20

Pre-Processing and feature identification

  • The strategy behind preprocessing is to detect each point and every click action, where click action can be prescribed as mouse movements after every click.

  • Continuous mouse actions are the movements where series of mouse actions or movements with short or no pause between each adjacent step.

  • Within the ith point-and-click action for a user c, we can denote the jth mouse move record as mouse move, ti, xi, yi_c,j, where tide notes the timestamp of the ith mouse action or movement.

  • Based on the information that belongs to every point and every click movement, we can find angle-based metrics

10 of 20

Mouse Feature description

Statistical features:

  • single click stats
  • double click stats
  • offset from ideal mouse trajectory
  • elapsed time

Behavioral features:

  • speed variation with time
  • acceleration with time

11 of 20

Different Features

  • Direction
  • Angle of Curvature
  • Curvature Distance
  • Speed
  • Pause and click

These features are unique to every user and can easily be used to characterise the behaviour of a user.

12 of 20

Data Pre-processing

  • The data logs were cleaned manually because there were multiple log-ins due to which the mouse-clicks were not continuous in format.
  • Used MM, MR MP, MC… data to extract 17 features unique to person’s mouse movement data like scroll_count, scroll_time, left click frequency, right click frequency, speedX, speedY etc.
  • We found out that 7 of the 17 features were highly correlated to the target function. These features were used for training the Naïve Bayes Classifier model.
  • We modified the pre-processed data to change the target function to user+emotion instead of just user to predict the emotion of the user as well while authentication.

13 of 20

Model (Naïve bayes Classifier)

  • Abstractly, naive Bayes is a conditional probability model: given a problem instance to be classified, represented by a vector representing some n features (independent variables), it assigns to this instance probabilities

for each of K possible outcomes or classes Ck.

  • The problem with the above formulation is that if the number of features n is large or if a feature can take on a large number of values, then basing such a model on probability tables is infeasible. We therefore reformulate the model to make it more tractable. Using Bayes' theorem, the conditional probability can be decomposed as

14 of 20

The naive Bayes classifier combines this model with a decision rule. One common rule is to pick the hypothesis that is most probable; this is known as the maximum a posteriori or MAP decision rule.

A Bayes Classifier, is the function that assigns a class label y_cap = Ck for some k as follows:

15 of 20

Model Testing Statistics

Series 1 - Only User Name Authentication

Series 2 - Only Mood Analysis

16 of 20

User authentication and mood analysis using 5 fold validation.

Series 1(blue) = User authentication

Series 2(orange) = Mood analysis

17 of 20

Discussion

We first attributed the non-attainment of the desired accuracy to the flaws in the data pre-processing model. But, after training other models like binary classification,decision trees and KNN. with the pre-processed data, we encountered promising results for user authentication and emotion analysis. So we finally came to the conclusion that the low accuracy was actually due to the limitation of the Naïve Bayes classifier. Here are some of the reasons for the same –

  1. Dependency of Features – We assume that the features used by the classifier are independent of each other. But since we know that we are deriving the features from the mouse log data, they tend to be correlated.

18 of 20

  • Data Scarcity – For any possible value of a feature, we need to estimate a likelihood value by frequentist approach. This can result in probabilities going towards 0 or 1 which in turn leads to numerical instabilities and worse results.

19 of 20

References

20 of 20