1 of 38

���������������DECISION TREE – ID3 Algortihm��

Mr Rajkumar D

Assistant Professor (S.G)

MCA DEPARTMENT

SRMIST, Ramapuram

2 of 38

DECISION TREE - INTRODUCTION

  • Decision trees are powerful and popular tools for classification and prediction.
  • Decision trees represent rules. Decision tree is a classifier in the form of a tree structure where each node is either:
  • a leaf node, indicating a class of instances
  • a decision node that specifies some test to be carried out on a single attribute value, with one branch and sub-tree for each possible outcome of the test.
  • A decision tree can be used to classify an instance by starting at the root of the tree and moving through it until a leaf node, which provides the classification of the instance.

3 of 38

Constructing Decision Trees

  • Decision tree programs construct a decision tree T from a set of training cases. The original idea of construction of decision trees goes back to the work of Hoveland and Hunt on Concept Learning Systems (CLS) in the late 1950s.
  • The algorithm consists of five steps.
  • T ¬ the whole training set. Create a T node.
  • If all examples in T are positive, create a ‘P’ node with T as its parent and stop.
  • If all examples in T are negative, create a ‘N’ node with T as its parent and stop.
  • Select an attribute X with values v1, v2, …, vN and partition T into subsets T1, T2, …,

TN according their values on X. Create N nodes Ti (i = 1,..., N) with T as their parent

and X = vi as the label of the branch from T to Ti.

  1. 5. For each Ti do: T ¬ Ti and go to step 2.

4 of 38

SAMPLE DECISION TREE

5 of 38

ID3 Decision Tree Algorithm

  • J. Ross Quinlan originally developed ID3 at the University of Sydney.
  • He first presented ID3 in 1975 in a book, Machine Learning, vol. 1, no. 1. ID3 is based on the Concept Learning System (CLS) algorithm.
  • In the decision tree each node corresponds to a non-goal attribute and each arc to a possible value of that attribute.
  • A leaf of the tree specifies the expected value of the goal attribute for the records described by the path from the root to that leaf. [This defines what a decision tree is.]
  • In the decision tree at each node should be associated the non-goal attribute which is most informative among the attributes not yet considered in the path from the root.
  • Entropy is used to measure how informative is a node.

6 of 38

SAMPLE DATASET

7 of 38

IMPORTANT NOTATIONS

8 of 38

STEPS TO BE FOLLOWED

9 of 38

ENTROPY FOR ENTIRE DATASET

10 of 38

COMPLETE DATASET-ENTROPY

11 of 38

ENTROPY FOR OUTLOOK

12 of 38

ENTROPY FOR OUTLOOK

13 of 38

AVERAGE INFORMATION-OUTLOOK

14 of 38

INFORMATION GAIN - OUTLOOK

15 of 38

ENTROPY - TEMPERATURE

16 of 38

AVERAGE INFORMATION - TEMPERATURE

17 of 38

INFORMATION GAIN - TEMPERATURE

18 of 38

ENTROPY - HUMIDITY

19 of 38

AVERAGE INFORMATION- HUMIDITY

20 of 38

ENTROPY - WINDY

21 of 38

AVERAGE INFORMATION - WINDY

22 of 38

INFORMATION GAIN - WINDY

23 of 38

HIGHEST GAIN ATTRIBUTE

24 of 38

OUTLOOK – ATTRIBUTE - ROOT

25 of 38

26 of 38

ENTROPY - SUNNY

27 of 38

ENTROPY, AI,IG - HUMIDITY

28 of 38

ENTROPY, AI,IG - WINDY

29 of 38

ENTROPY, AI,IG - TEMPERATURE

30 of 38

HIGHEST GAIN ATTRIBUTE

31 of 38

32 of 38

ENTROPY - TEMPERATURE

33 of 38

ENTROPY - HUMIDITY

34 of 38

ENTROPY - WINDY

35 of 38

ENTROPY - TEMPERATURE

36 of 38

HIGHEST GAIN ATTRIBUTE

37 of 38

FINAL DECISION TREE

38 of 38

Attendance

Thank You