1 of 110

Data Science Application of�Artificial Intelligence/Machine Learning

Saurabh +xxxxxxxxxxxxxxx

Srivastava saurabhnitkian@gmail.com

vscentrum.be

2 of 110

Part-1:

Data Science

2

3 of 110

DATA

3

There is not a single big industry that does not rely on data and the insights gained through them

4 of 110

Part-1:

The theory Behind

4

5 of 110

Introduction

    • Machine Learning: The subfield of Artificial Intelligence and Computer Science
      • Imported much relevant knowledge from statistics and probability theory
      • Better computer scientists who are able to handle data analysis problems

5

Incremental

User Data

Large Distributed

Databases

Incremental

Sensor Data

Statistics

Big Data

Artificial Intelligence

Machine Learning

Data Science

6 of 110

Relation: Data Science, Big-data

  • Machine Learning is mostly dependent on inferential statistics which draws conclusions on populations from studies of samples in contrast to descriptive statistics, which primarily summarizes samples.
  • Data Science as for Statistics is assumed to cover:
    • Data collection/data capturing/data harvesting
    • Data modeling
    • Data maintenance
    • Data analysis/data processing

- Visualization/presentation of data and decision-making based on data.

  • Big Data (TB/ ZB; variety, quality, speed) primarily refers to the storage, maintenance, and access to data,

- The Big Data area is based on more traditional areas such as very large databases, data warehousing,

and distributed databases.

6

7 of 110

Some Background….

  • Artificial Intelligence has 62-year-old roots
    • The area was named and defined at a Summer workshop in 1956.
    • This happened not much longer than a decade after the advent of the first computer.
    • A small group of computer scientists gathered at Dartmouth College in New Hampshire, US.

Agenda: “The study is to proceed on the basis of the conjecture that every aspect of

learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.

An attempt will be made to find out how to make machines:

  • use language
  • form abstractions and concepts
  • solve kinds of problems now reserved for humans and improve themselves”

7

8 of 110

Some Background….

8

Founding Fathers’ of Artificial Intelligence in 1956

Claude Shannon Founder of Information and Communication Theory

D.M. Mackay British researcher in Information Theory and Brain organization

Julian Bigelow Chief engineer for the von Neumann computer at Princeton in 1946

Nathaniel Rochester Author of the first assembler for the first commercial computer

Oliver Selfridge Named ’the father of Machine Perception’

Ray Solomonoff Inventor of Algorithmic Probability

John Holland The inventor of Genetic Algorithms

Marvin Minsky Key MIT researcher in the early development of AI

Allen Newell Champion for symbolic AI and inventor of central AI techniques

Herbert Simon Pioneer in Decision-making theory and a Nobel Prize Winner

John McCarthy Inventor of the LISP programming language

9 of 110

Some Background….

  • In 1955 Allen Newell, Herbert A. Simon and Cliff Shaw created the Logic Theorist, the first program deliberately engineered to mimic the problem-solving skills of a human being. It is called "the first Artificial Intelligence program". It could prove theorems in Whitehead and Russel’s Principia Mathematica. It introduced key artificial intelligence techniques such as list processing and heuristic search.
  • In 1958 John McCarthy created the first version of LISP -based on Lambda Calculus and using list processing, the second-oldest high-level programming language in widespread use today. (Only Fortran is older, by one year).
  • Oliver Selfridge created the Pandemonium architecture in 1959, which was one of the first computational models in pattern recognition for images.
  • In 1959 Simon, Newell and Shaw created the General Problem Solver (GPS) a computer program intended to work as a universal problem solver machine. Any problem that can be expressed as a set of well-formed formulas (WFFs) or Horn clauses, can be solved, in principle, by GPS.

  • Arthur Samuel coined the term Machine Learning in 1959
  • McCulloch and Pitts introduced Neural Networks as a model of computation as early as 1943
  • Marvin Minsky and Dean Edmonds build SNARC in 1956, the first Neural Network machine, able to learn.
  • Frank Rosenblatt invents the Perceptron in 1957

9

10 of 110

Some Background….

10

LG has launched the ThinQ AI focused TV brand.

Huawei, Samsung and Qualcomm launch AI powered Smart Phones.

Burger King boosts ´AI-written´ ads. .................

A majority of real current Artificial Intelligence success stories relate to the application of Machine Learning only!

A majority of current Machine Learning success stories relate to Image and Speech processing!

11 of 110

ML application sectors

11

General application sectors:

  • Medical diagnosis, personalized treatments and drug design
  • Driverless vehicles and household robots
  • Personal assistants, recommender systems and navigators
  • Adapting Communications and Social media services
  • Marketing and sales
  • Optimization of technical processes
  • Monitoring and surveillance
  • Financial services
  • Cyber security
  • Machine translation

Specific categories of data analysis:

  1. Image Recognition – Computer vision (Image analysis for diagnosis of breast cancer )
  2. Speech Recognition (filing medical records)
  3. Data-mining for Large Datasets (large clinical databases)
  4. Text-mining of Large Document Collections (new medical publications to updated medical expert systems)
  5. Dynamic adaption of technical systems (Training of robot movements for surgical robots).

12 of 110

Data Analysis for ML

12

Data Analysis

The End-to-end process for Real-World problems

In a typical machine learning application, practitioners must apply the appropriate:

  • Data harvesting from potentially heterogeneous sources
  • Pre-processing of data (e.g. from analog to digital form)
  • Model or theory support
  • Feature engineering
  • Algorithm selection 
  • Tailoring conditions for algorithms

(hyper-parameter settings, language biases, complexity) 

  • Core analysis phase
  • Post-processing of acquired knowledge
  • Visualization and preparation of material for online updating and decision making.

Machine

Learning

13 of 110

Regression & Classification

  • Main Scenarios for Data Analysis
    • Regression: establishing prognosis of future states
    • Classification: establishing concepts for classifying in future situations
  • Regression is a technique from statistics that is used to predict values of the desired target quantity when the target quantity is ´continuous´.
  • Classification predicts the discrete number of values. In classification, the data is categorized under different labels according to some parameters and then the labels are predicted for the data. 

13

14 of 110

Objects & Features

  • Object: Thing, Entity, Observation, Data, Data-item, Record, Tuple, Instance, Example
  • Feature: Property, Attribute, Characteristic, Variable, Output Variable, Predictor, Target, Category
  • The Object (Data, Observation) Language is the chosen language (formalism) in which objects and features are described.
  • Types of Features:
    • Ordinal (binary)
    • Discrete numerical (integers)
    • Continous numerical (real numbers)
    • Symbolic
    • Structural (e.g graphs or lists)

    • ZOO dataset (from UCI ML repository):
    • Naive and partial classification of animals
    • 107 objects characterized by 18 features classified in 7 categories

14

Category structure (Animal)

Mammal(#1)

Bird(#2)

Reptile(#3)

Fish(#4)

Amphibian(#5)

Insect(#6)

Invertebrate(#7)

15 of 110

Object and Feature

Synonyms Synonyms

Thing Property

Entity Attribute

Observation Characteristic

Data, Data-item

Record, Tuple Field

Row, Vector Column

Variable, Output Variable

Instance, training instance Independent variable, Predictor Variable

Example, training example Target or Category feature

  • The Object (Data, Observation) Language is the chosen language (formalism) in which objects and features can be described.

15

16 of 110

16

Object space

Instance space

Population

Subsets of object space available for Learning

Sample, Training sample, Statistical sample

Data-set, Table, Array

Training example set

17 of 110

17

Example from the ZOO Dataset

The Object space or population is the set of all potential feature vectors with feature values as can be expressed in the ZOO object language.

The Sample or Data-set is the whole set of ZOO feature vectors.

The Extension of the Concept of buffalo is the set of all buffalos in real life.

18 of 110

Objects & Features

18

Features

animal_name, hair, feathers, eggs, milk, airborne, aquatic, predator, toothed, backbone, breathes, venomous, fins, legs, tail, domestic, catsize, class_type.

All features are Boolean except the animal-name which is a text and class-type and legs which are integers.

Example from the ZOO Dataset: buffalo,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1

Example of Object and Feature vector

The Object Language in this case is the specific formalism for specifying Feature vectors

animal_name buffalo category symbolic feature

Hair 1 predictor ordinal feature

Feathers 0 predictor ordinal feature

Eggs 0 predictor ordinal feature

Milk 1 predictor ordinal feature

Airborne 0 predictor ordinal feature

Aquatic 0 predictor ordinal feature

Predator 0 predictor ordinal feature

Toothed 1 predictor ordinal feature

Backbone 1 predictor ordinal feature

Breathes 1 predictor ordinal feature

Venomous 0 predictor ordinal feature

Fins 0 predictor ordinal feature

Legs 4 predictor discrete numerical feature

Tail 1 predictor ordinal feature

Domestic 0 predictor ordinal feature

Catsize 1 predictor ordinal feature

class_type 1 category discrete numerical feature

19 of 110

Generalization

19

Object Category

Definition

Subset of the Data-set Subset of the

Consistent with Object space

the category consistent with

definition the category

definition

Instance-of

Element-of

Subset-of

20 of 110

20

Example from the ZOO Dataset

Example of

Conception

Definition

The Hypotheses

Language is the

same as the Object

Language apart from the

introduction of a wildcard (?)

for ordinal feature values.

fish 0,0,1,0,0,1,?,1,1,0,?,1,0,1,?,?,4

tuna, 0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,1,4

stingray, 0,0,1,0,0,1,1,1,1,0,1,1,0,1,0,1,4

seahorse, 0,0,1,0,0,1,0,1,1,0,0,1,0,1,0,0,4

pike, 0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,1,4

piranha, 0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4

herring, 0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4

haddock, 0,0,1,0,0,1,0,1,1,0,0,1,0,1,0,0,4

dogfish, 0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,1,4

chub, 0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4

catfish, 0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4

carp, 0,0,1,0,0,1,0,1,1,0,0,1,0,1,1,0,4

bass, 0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4

21 of 110

The ZOO dataset (107 Objects)

21

aardvark,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1,1 antelope,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1 bass,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4 bear,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1,1 boar,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1 buffalo,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1 calf,1,0,0,1,0,0,0,1,1,1,0,0,4,1,1,1,1 carp,0,0,1,0,0,1,0,1,1,0,0,1,0,1,1,0,4 catfish,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4 cavy,1,0,0,1,0,0,0,1,1,1,0,0,4,0,1,0,1 cheetah,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1 chicken,0,1,1,0,1,0,0,0,1,1,0,0,2,1,1,0,2 chub,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4 clam,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,7 crab,0,0,1,0,0,1,1,0,0,0,0,0,4,0,0,0,7 crayfish,0,0,1,0,0,1,1,0,0,0,0,0,6,0,0,0,7 crow,0,1,1,0,1,0,1,0,1,1,0,0,2,1,0,0,2 deer,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1 dogfish,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,1,4 dolphin,0,0,0,1,0,1,1,1,1,1,0,1,0,1,0,1,1 dove,0,1,1,0,1,0,0,0,1,1,0,0,2,1,1,0,2 duck,0,1,1,0,1,1,0,0,1,1,0,0,2,1,0,0,2 elephant,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1 flamingo,0,1,1,0,1,0,0,0,1,1,0,0,2,1,0,1,2 flea,0,0,1,0,0,0,0,0,0,1,0,0,6,0,0,0,6 frog,0,0,1,0,0,1,1,1,1,1,0,0,4,0,0,0,5 frog,0,0,1,0,0,1,1,1,1,1,1,0,4,0,0,0,5 fruitbat,1,0,0,1,1,0,0,1,1,1,0,0,2,1,0,0,1 giraffe,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1 girl,1,0,0,1,0,0,1,1,1,1,0,0,2,0,1,1,1 gnat,0,0,1,0,1,0,0,0,0,1,0,0,6,0,0,0,6 goat,1,0,0,1,0,0,0,1,1,1,0,0,4,1,1,1,1 gorilla,1,0,0,1,0,0,0,1,1,1,0,0,2,0,0,1,1 gull,0,1,1,0,1,1,1,0,1,1,0,0,2,1,0,0,2 haddock,0,0,1,0,0,1,0,1,1,0,0,1,0,1,0,0,4 hamster,1,0,0,1,0,0,0,1,1,1,0,0,4,1,1,0,1 hare,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,0,1 hawk,0,1,1,0,1,0,1,0,1,1,0,0,2,1,0,0,2 herring,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4 honeybee,1,0,1,0,1,0,0,0,0,1,1,0,6,0,1,0,6 housefly,1,0,1,0,1,0,0,0,0,1,0,0,6,0,0,0,6 kiwi,0,1,1,0,0,0,1,0,1,1,0,0,2,1,0,0,2 ladybird,0,0,1,0,1,0,1,0,0,1,0,0,6,0,0,0,6 lark,0,1,1,0,1,0,0,0,1,1,0,0,2,1,0,0,2 leopard,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1 lion,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1 lobster,0,0,1,0,0,1,1,0,0,0,0,0,6,0,0,0,7 lynx,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1 mink,1,0,0,1,0,1,1,1,1,1,0,0,4,1,0,1,1 mole,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,0,1 mongoose,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1 moth,1,0,1,0,1,0,0,0,0,1,0,0,6,0,0,0,6 newt,0,0,1,0,0,1,1,1,1,1,0,0,4,1,0,0,5 octopus,0,0,1,0,0,1,1,0,0,0,0,0,8,0,0,1,7 opossum,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,0,1 oryx,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1 ostrich,0,1,1,0,0,0,0,0,1,1,0,0,2,1,0,1,2 parakeet,0,1,1,0,1,0,0,0,1,1,0,0,2,1,1,0,2 penguin,0,1,1,0,0,1,1,0,1,1,0,0,2,1,0,1,2 pheasant,0,1,1,0,1,0,0,0,1,1,0,0,2,1,0,0,2 pike,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,1,4 piranha,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4 pitviper,0,0,1,0,0,0,1,1,1,1,1,0,0,1,0,0,3 platypus,1,0,1,1,0,1,1,0,1,1,0,0,4,1,0,1,1 polecat,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1 pony,1,0,0,1,0,0,0,1,1,1,0,0,4,1,1,1,1 porpoise,0,0,0,1,0,1,1,1,1,1,0,1,0,1,0,1,1 puma,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1 pussycat,1,0,0,1,0,0,1,1,1,1,0,0,4,1,1,1,1 raccoon,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1 reindeer,1,0,0,1,0,0,0,1,1,1,0,0,4,1,1,1,1 rhea,0,1,1,0,0,0,1,0,1,1,0,0,2,1,0,1,2 scorpion,0,0,0,0,0,0,1,0,0,1,1,0,8,1,0,0,7 seahorse,0,0,1,0,0,1,0,1,1,0,0,1,0,1,0,0,4 seal,1,0,0,1,0,1,1,1,1,1,0,1,0,0,0,1,1 sealion,1,0,0,1,0,1,1,1,1,1,0,1,2,1,0,1,1 seasnake,0,0,0,0,0,1,1,1,1,0,1,0,0,1,0,0,3 seawasp,0,0,1,0,0,1,1,0,0,0,1,0,0,0,0,0,7 skimmer,0,1,1,0,1,1,1,0,1,1,0,0,2,1,0,0,2 skua,0,1,1,0,1,1,1,0,1,1,0,0,2,1,0,0,2 slowworm,0,0,1,0,0,0,1,1,1,1,0,0,0,1,0,0,3 slug,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,7 sole,0,0,1,0,0,1,0,1,1,0,0,1,0,1,0,0,4 sparrow,0,1,1,0,1,0,0,0,1,1,0,0,2,1,0,0,2 squirrel,1,0,0,1,0,0,0,1,1,1,0,0,2,1,0,0,1 starfish,0,0,1,0,0,1,1,0,0,0,0,0,5,0,0,0,7 stingray,0,0,1,0,0,1,1,1,1,0,1,1,0,1,0,1,4 swan,0,1,1,0,1,1,0,0,1,1,0,0,2,1,0,1,2 termite,0,0,1,0,0,0,0,0,0,1,0,0,6,0,0,0,6 toad,0,0,1,0,0,1,0,1,1,1,0,0,4,0,0,0,5 tortoise,0,0,1,0,0,0,0,0,1,1,0,0,4,1,0,1,3 tuatara,0,0,1,0,0,0,1,1,1,1,0,0,4,1,0,0,3 tuna,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,1,4 vampire,1,0,0,1,1,0,0,1,1,1,0,0,2,1,0,0,1 vole,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,0,1 vulture,0,1,1,0,1,0,1,0,1,1,0,0,2,1,0,1,2 wallaby,1,0,0,1,0,0,0,1,1,1,0,0,2,1,0,1,1 wasp,1,0,1,0,1,0,0,0,0,1,1,0,6,0,0,0,6 wolf,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1 worm,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,7 wren,0,1,1,0,1,0,0,0,1,1,0,0,2,1,0,0,2

22 of 110

Classification Task

  • In terms of Features, Feature vectors, and the Object (Feature) Space
  • The classical way of viewing a scenario for a learning task is to:
  • Define an appropriate set of Features
  • View each data-item as a Feature vector
  • Consider the Feature (Object) Space spanned by the Features.
  • Populate the Feature space with the Feature vectors (Data-items)
  • Find optimal multi-dimensional surfaces (Hyperplanes) in the Object Space that circumscribe the extensions of all concepts involved.
  • The engineering of Features is crucial for the complexity of the Object Space and as a consequence also crucial for the complexity of the learning problem.
  • Very often, data-items are of a non-digital nature and relevant features need to be extracted from the data-items as a separate process.

22

23 of 110

Dimensionality Reduction

23

Features

Dimension-1

Dimension-2

Dimension-3

Dimension …..

Dimension-k

Features’

Dimension-1’

Dimension-2’

Dimension-3’

Dimension …..

Dimension-k’

Principal Component Analysis: Dimension Reduction (compression etc )

Transformation

24 of 110

PCA Example

24

PC1 has the largest variance

(Most information)

PC7 has the smallest variance

(Least information)

25 of 110

Feature Selection/ Reduction

25

Each image is a Data-item

Feature Selection:

Features can be derived in a variety of manners ranging from totally manual, via manual/automatic hybrids to totally automated.

Every non-digital form of representation demands it own specialized techniques in the automated case.

Dimensionality/ Feature reduction:

        • making models easier to interpret by human users.
        • avoiding the curse of dimensionality
        • reducing the risk for overfitting.
        • shortening the computation times for learning processes.

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces (typically with hundreds or thousands of dimensions)

26 of 110

Over-Fitting Vs. Under-Fitting

  • Over-fitting is the production of a model that corresponds too closely or exactly to a particular data-set, and may therefore fail to fit additional data or predict future observations reliably. An over-fitted model is a model that contains more features than can be justified by the data-set.
  • Under-fitting occurs when a set of features cannot adequately capture the available data-set. An under-fitted model is a model where some features that would appear in a correctly specified model are missing. Such a model will tend to have poor predictive performance.

26

27 of 110

Feature Selection Vs. Feature Extraction

  • Feature selection is the process of selecting a subset of relevant features from the original set. The three main criteria for selection of a feature are:
  • Informative-ness
  • Relevance and
  • Non-redundancy.

  • Feature extraction is the process of deriving new features either as simple combinations of original features or as a more complex mapping from the original set to the new set.

  • In both cases, the learning task is supposed to be more tractable in the resulting feature space than in the original.

27

28 of 110

Machine Learning Tasks

  • Supervised learning
    • Regression: predict numerical values
    • Classification: predict categorical values, i.e., labels
  • Unsupervised learning
    • Clustering: group data according to "distance"
    • Association: find frequent co-occurrences
    • Link prediction: discover relationships in data
    • Data reduction: project features to fewer features
  • Reinforcement learning
    • Reward-based state prediction of agent in an environment, to maximize the cumulative rewards.

28

29 of 110

Regression

Colorize B&W images automatically

https://tinyclouds.org/colorize/

29

30 of 110

Classification

30

31 of 110

Reinforcement learning

31

32 of 110

Clustering

32

33 of 110

Applications in Science

33

34 of 110

Machine Learning Algorithms

34

35 of 110

35

  • Classification techniques predict categorical responses, for example, whether an email is genuine or spam, or whether a tumor is cancerous or benign. Classification models classify input data into categories. Typical applications include medical imaging, image and speech recognition, and credit scoring.
  • Regression techniques predict continuous responses, for example, changes in temperature or fluctuations in power demand. Typical applications include electricity load forecasting and algorithmic trading.
  • Unsupervised learning finds hidden patterns or intrinsic structures in data. It is used to draw inferences from datasets consisting of input data without labeled responses. Clustering is the most common unsupervised learning technique. It is used for exploratory data analysis to find hidden patterns or groupings in data. Applications for clustering include gene sequence analysis, market research, and object recognition.

36 of 110

Not Always Perfect…..

  • Many machine learning/AI projects fail�(Gartner claims 85 %)

  • Ethics, e.g., Amazon has/had�sub-par employees fired by an AI�automatically

36

37 of 110

Failure Reasons

  • Asking the wrong question
  • Trying to solve the wrong problem
  • Not having enough data
  • Not having the right data
  • Having too much data
  • Hiring the wrong people
  • Using the wrong tools
  • Not having the right model
  • Not having the right yardstick

37

38 of 110

Implementation

  • Programming languages
    • Python
    • R
    • C++
    • ...
  • Many libraries
    • scikit-learn
    • PyTorch
    • TensorFlow
    • Keras

38

classic machine learning

deep learning frameworks

Fast-evolving ecosystem!

39 of 110

Scikit-learn

  • Nice end-to-end framework
    • Data exploration (+ pandas + holoviews)
    • Data preprocessing (+ pandas)
      • Cleaning/missing values
      • Normalization
    • Training
    • Testing
    • Application
  • "Classic" machine learning only
  • https://scikit-learn.org/stable/

39

40 of 110

Keras(TensorFlow)

  • High-level framework for deep learning
  • Tensor-Flow backend
  • Layer types
    • Dense
    • Convolutional
    • Pooling
    • Embedding
    • Recurrent
    • Activation
  • https://keras.io/

40

41 of 110

Procedure

  • Data ingestion
    • CSV/JSON/XML/H5 files, RDBMS, NoSQL, HTTP,...
  • Data cleaning
    • Outliers/invalid values? → filter
    • Missing values? → impute
  • Data transformation
    • Scaling/Normalization

41

Must be done systematically

42 of 110

Supervised Learning: Methodology

  • Select model, e.g., random forest, (deep) neural network, ...
  • Train model, i.e., determine parameters
    • Data: input + output
      • Training data → determine model parameters
      • Validation data → yardstick to avoid overfitting
  • Test model
    • Data: input + output
      • Testing data → final scoring of the model
  • Production
    • Data: input → predict output

42

43 of 110

From Neurons to ANN’s

43

 

 

 

activation function

 

 

 

 

 

 

 

 

...

 

 

 

inspiration

44 of 110

From ANN’’S ‘s to DNN’’s

44

How to determine�weights?

45 of 110

Training: Backpropagation

  • Initialize weights "randomly"
  • For all training epochs
    • for all input-output in training set
      • using input, compute output (forward)
      • compare computed output with training output
      • adapt weights (backward) to improve output
    • if accuracy is good enough, stop

45

Ex. dataset with 200 samples (rows of data) and a batch size of 5 and 1,000 epochs.

  • This means that the dataset will be divided into 40 batches, each with five samples.
  • The model weights will be updated after each batch of five samples.
  • This also means that one epoch will involve 40 batches or 40 updates to the model.
  • With 1,000 epochs, the model will be exposed to or pass through the whole dataset 1,000 times. That is a total of 40,000 batches during the entire training process.

46 of 110

Deep neural networks

  • Many layers
  • Features are learned, not given
  • Low-level features combined into�high-level features

  • Special types of layers
    • Convolutional
    • Drop-out
    • Recurrent
    • ...

46

47 of 110

Convolutional neural networks

47

 

48 of 110

Convolution examples

48

 

 

 

 

49 of 110

Sentiment Classification

  • Input data
    • movie review (English)
  • Output data

  • Training examples
  • Test examples

49

/

<start> this film was just brilliant casting location

scenery story direction everyone's really suited the part

they played and you could just imagine being there Robert

redford's is an amazing actor and now the same being director

norman's father came from the same scottish island as myself

so i loved the fact there was a real connection with this

film the witty remarks throughout the film were great it was

just brilliant so much that i bought the film as soon as it

50 of 110

Quill Bot

  • Represent words as one-hot vectors�length = vocabulary size
    • dense vector

  • Word embeddings
      • dense vector
    • vector distance ≈ semantic distance

  • Training
    • use context
    • discover relations with surrounding words

50

Issues:

  • Unwieldy (large in size)
  • no semantics

51 of 110

Part-2:

A working example (With python)

51

52 of 110

Working Example

      • Python: high-level, interpreted, general-purpose programming language
      • Jupyter notebook: a web application for creating and sharing computational documents.
      • Python libraries

pandas: It has functions for analyzing, cleaning, exploring, and manipulating data.

numpy: NumPy is the fundamental package for scientific computing in Python

matplotlib.pyplot: a collection of functions that make matplotlib work like MATLAB

seaborn: Seaborn is a library for making statistical graphics in Python. It builds on top of matplotlib and integrates closely with pandas data structures

52

Python commands

  • import pandas as pd
  • import numpy as np
  • import matplotlib.pyplot as plt
  • import seaborn as sns

In Python, alias eg. pd, are an alternate name for referring to the same thing.

53 of 110

Importing the Dataset

  • Import the heart disease dataset from the link:

https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data

  • cols = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal', 'num’]
  • data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/heart’, names = cols)

Open Jupyter Notebook

53

54 of 110

54

55 of 110

55

Click on “new”

56 of 110

56

Select Python 3 (ipykernel)

New Tab/Window appears

57 of 110

57

After entering commands (one at a time) here, click on the “Run” button

58 of 110

  • NumPy: Fundamental package for scientific computing. It includes the functionality for
    • Multidimensional arrays
    • High level mathematical functions (linear algebra, Fourier transform, pseudorandom number generation)
    • In scikit-learn (sklearn) , NumPy array is fundamental data structure.
    • Scikit-learn provides clean datasets. It takes data in the form of NumPy arrays ( each data needs to be converted into NumPy array)

    • SciPy: Collection of functions for scientific computing in python (advanced linear algebra, mathematical function optimization, signal processing, special mathematical functions, and statistical distributions)
      • When do we require SciPy ?

58

When the 2-D array with a lot of zeros (sparse array ) needs to be stored

59 of 110

  • Convert Numpy Array to a SciPy sparse matrix in CSR (Compressed Sparse Row format)

59

60 of 110

  • Matplotlib: Primary scientific plotting library in python. It provides functions for:
      • Publication-quality visualizations (line charts, histograms, scatter plots etc.)

60

%matplotlib inline : displays figure (static) inside the notebook

%matplotlib : magic command

%matplotlib notebook gives interactive plots embedded within the notebook.

%matplotlib inline gives static images of plot embedded in the notebook

With magic command, plt.show () is not required

61 of 110

61

Reset original view

Back

Forward

Left button pans, Right button zooms x/y fixes axis, CTRL fixes aspect

Zoom to rectangle x/y fixes axis

Download plot

62 of 110

  • Pandas: Python library for data wrangling and analysis.
    • Built around the data structure called DataFrame.
    • Similar to an Table in an excel spreadsheet.
    • Pandas provides operations on tables, where each Colum can be different type ( not possible in NumPy)

62

Data wrangling: process of removing errors and combining complex data sets to make them more accessible and easier to analyze.

63 of 110

Simple illustration with scikit-learn iris dataset

  • Toy-dataset (total 6) can be found at sklearn.datasets

63

Load_iris returns a “bunch” object, instead of a tabular format. Bunch has keys (for lookup) and values; similar to a dictionary. Iris_dataset has 8 keys.

‘data’ (all the feature data in a NumPy array) & ‘target’ ( variable to predict, in a Numpy array)

64 of 110

The ‘data’ key & ‘target’ key

64

150 rows (entries), each with 4 attributes (features) => define specific ‘target’ key (classification)

0 means setosa;

1 means versicolor;

2 means virginica

65 of 110

65

66 of 110

66

67 of 110

Custom Dataset (Excel worksheet) Importing

67

Pickle is used for serializing and de-serializing Python object structures, also called marshalling or flattening. Serialization refers to the process of converting an object in memory to a byte stream that can be stored on disk or sent over a network.

68 of 110

68

https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/cleveland.data

69 of 110

69

Index of /ml/machine-learning-databases/heart-disease

Download: processed.cleveland.data

Open (with File/Open):

processed.cleveland.data

70 of 110

70

Dataset features

71 of 110

71

Used Predictors

All Predictors

Dataset Specifications

72 of 110

72

Out of 76, 14 attributes (features) used

‘age’= 63.0, ‘sex’ =1.0, ‘cp’=1.0, ‘trestbps’= 145.0, ‘chol’ = 233.0, ‘fbs’ = 1.0, ‘restecg’ = 2.0, ‘thalach’ = 150.0, ‘exang’ = 0.0, ‘oldpeak’= 2.3, ‘slope’= 3.0, ‘ca’=0.0, ‘thal’=6.0, ‘num’=0

Seaborn: Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

73 of 110

73

Read the dataset

Determine dataset type

Predictors

74 of 110

74

Last 10 Values of the dataset

75 of 110

75

Shape (rows, columns) of dataset

76 of 110

76

Describe statistical characteristics of dataset

77 of 110

77

Check for missing values (if any)

78 of 110

78

Check for missing ‘?’ values

‘?’ values in ‘ca’ and ‘thal’ predictors

79 of 110

79

Handle missing ‘?’ values using SimpleImputer

Definition

Impute: Attribute (sort of assignment)

80 of 110

80

Now no ‘?’ values in ‘ca’ and ‘thal’

81 of 110

81

Again check for missing values (if any)

4 missing values in ‘ca’

2 missing values in ‘thal’

82 of 110

82

Replace missing values by the mean value

Imputer returns numpy array & not dataframe

83 of 110

83

Numpy.ndarray convert to panda dataframe

# while using pd.read_csv() we use names = cols

# while using pd.DataFrame() we use columns = cols

84 of 110

84

No missing value or ‘?’ or ‘nan’

Data also panda data frame

(Examining and Cleaning Data)

85 of 110

85

# unique values (Classes) in predictor num

86 of 110

86

5 Classes converted to 2 classes (Binary classification)

Heart Disease: Yes /No

Binary Classification

87 of 110

87

Check data for binary class

88 of 110

88

Split data into X (feature matrix) and y (target vector)

data.iloc[:,0:-1] => all rows, all columns from 0 to last column.

data.iloc[:,-1] => all rows, and only last column.

89 of 110

89

Split into:

Training data

Testing data

90 of 110

90

Import Classifiers from corresponding model libraries in Scikit Learn

91 of 110

91

Build Classifier (Algorithms) Models

92 of 110

92

Fit models on training data

93 of 110

93

Determine score, i.e. accuracy on test-data

94 of 110

94

Boxplot of variables

A box plot is a graphical rendition of statistical data based on the minimum, first quartile, median, third quartile, and maximum (additionally whiskers and outliers)

95 of 110

95

Import Scaler and Pipeline

Fit training data to pipeline

Calculate score (after scaling and pipelining)

The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name 

96 of 110

96

Part-3:

working example (With MATLAB)

97 of 110

Classification :Fisher’s Iris Data (with MATLAB)

  • load fisheriris
  • f = figure;
  • gscatter(meas(:,1), meas(:,2), species,'rgb','osd');
  • xlabel('Sepal length');
  • ylabel('Sepal width');

97

>> size(meas)

ans =

150 4

98 of 110

  • The fitcdiscr function can perform classification using different types of discriminant analysis. First, classify the data using the default linear discriminant analysis (LDA).

98

lda = fitcdiscr(meas(:,1:2),species)

lda =

ClassificationDiscriminant

ResponseName: 'Y'

CategoricalPredictors: []

ClassNames: {'setosa' 'versicolor' 'virginica'}

ScoreTransform: 'none'

NumObservations: 150

DiscrimType: 'linear'

Mu: [3×2 double]

Coeffs: [3×3 struct]

ldaClass = resubPredict(lda) -- ?

99 of 110

99

The observations with known class labels are usually called the training data. Now compute the resubstitution error, which is the misclassification error (the proportion of misclassified observations) on the training set.

>> ldaResubErr = resubLoss(lda)

ldaResubErr =

0.2000

You can also compute the confusion matrix on the training set. A confusion matrix contains information about known class labels and predicted class labels. Generally speaking, the (i,j) element in the confusion matrix is the number of samples whose known class label is class i and whose predicted class is j. The diagonal elements represent correctly classified observations.

Of the 150 training observations, 20% or 30 observations are misclassified by the linear discriminant function.

100 of 110

100

Total misclassifications

1+14+15 = 30 or 20%

Confusion Matrix

figure

ldaResubCM = confusionchart(species,ldaClass);

101 of 110

101

figure(f)

bad = ~strcmp(ldaClass,species);

hold on;

plot(meas(bad,1), meas(bad,2), 'kx');

hold off;

102 of 110

102

The function has separated the plane into regions divided by lines, and assigned different regions to different species. One way to visualize these regions is to create a grid of (x,y) values and apply the classification function to that grid.

[x,y] = meshgrid(4:.1:8,2:.1:4.5);

x = x(:);

y = y(:);

j = classify([x y],meas(:,1:2),species);

gscatter(x,y,j,'grb','sod')

103 of 110

103

For some data sets, the regions for the various classes are not well separated by lines. When that is the case, linear discriminant analysis is not appropriate. Instead, you can try quadratic discriminant analysis (QDA) for our data.

Compute the resubstitution error for quadratic discriminant analysis.

qda = fitcdiscr(meas(:,1:2),species,'DiscrimType','quadratic');

qdaResubErr = resubLoss(qda)

qdaResubErr =

0.2000

test error (also referred to as generalization error), which is the expected prediction error on an independent set.

104 of 110

104

  • In this case you don't have another labeled data set, then you can simulate one by doing cross-validation.
  • A 10-fold cross-validation is a popular choice for estimating the test error on classification algorithms.
  • It randomly divides the training set into 10 disjoint subsets.
  • Each subset has roughly equal size and roughly the same class proportions as in the training set.
  • Remove one subset, train the classification model using the other nine subsets, and use the trained model to classify the removed subset.
  • This is repeated by removing each of the ten subsets one at a time.

Because cross-validation randomly divides data, its outcome depends on the initial random seed. To reproduce the exact results in this example, execute the following command:

105 of 110

105

rng(0,'twister’);

cp = cvpartition(species,'KFold',10)

cp =

K-fold cross validation partition

NumObservations: 150

NumTestSets: 10

TrainSize: 135 135 135 135 135 135 135 135 135 135

TestSize: 15 15 15 15 15 15 15 15 15 15

The crossval and kfoldLoss methods can estimate the misclassification error for both LDA and QDA using the given data partition cp.

Estimate the true test error for LDA using 10-fold stratified cross-validation.

106 of 110

106

cvlda = crossval(lda,'CVPartition',cp);

ldaCVErr = kfoldLoss(cvlda)

ldaCVErr =

0.2000

The LDA cross-validation error has the same value as the LDA resubstitution error on this data.

Estimate the true test error for QDA using 10-fold stratified cross-validation.

cvqda = crossval(qda,'CVPartition',cp);

qdaCVErr = kfoldLoss(cvqda)

qdaCVErr =

0.2200

QDA has a slightly larger cross-validation error than LDA. It shows that a simpler model may get comparable, or better performance than a more complicated model.

107 of 110

107

Naive Bayes classifiers are among the most popular classifiers

The fitcnb function can be used to create a more general type of naive Bayes classifier.

First model each variable in each class using a Gaussian distribution. Then, you can compute the resubstitution error and the cross-validation error.

nbGau = fitcnb(meas(:,1:2), species);

nbGauResubErr = resubLoss(nbGau)

nbGauResubErr =

0.2200

nbGauCV = crossval(nbGau, 'CVPartition',cp);

nbGauCVErr = kfoldLoss(nbGauCV)

labels = predict(nbGau, [x y]);

gscatter(x,y,labels,'grb','sod')

108 of 110

108

We assumed the variables from each class to have a multivariate normal distribution. But sometimes the assumption is not valid. Now try to model each variable in each class using a kernel density estimation, which is a more flexible nonparametric technique. Setting the kernel to box

nbKD = fitcnb(meas(:,1:2), species, 'DistributionNames','kernel', 'Kernel','box');

nbKDResubErr = resubLoss(nbKD)

nbKDResubErr = 0.2067

109 of 110

109

nbKDCV = crossval(nbKD, 'CVPartition',cp);

nbKDCVErr = kfoldLoss(nbKDCV)

nbKDCVErr = 0.2133

labels = predict(nbKD, [x y]);

gscatter(x,y,labels,'rgb','osd')

For this data set, the naive Bayes classifier with kernel density estimation gets smaller resubstitution error and cross-validation error than the naive Bayes classifier with a Gaussian distribution.

110 of 110

110