1 of 12

Developing a Simple Classification Model Using Sample Data

From data preparation to model evaluation

Dr. Jamolbek Mattiev

2 of 12

What is a Classification Model?

  • A classification model assigns class labels to data instances
  • It is a type of supervised machine learning
  • The model learns from labeled training data

Example:Email → Spam / Not Spam

Tumor → Benign / Malignant

3 of 12

Objective of the Model

  • Learn patterns from sample data
  • Predict class labels for unseen data
  • Evaluate model performance using statistical measures

4 of 12

Sample Data for Classification

Sample data consists of:

  • Attributes (features) – input variables
  • Class attribute – target label
  • Instances – data records

Example Table:Age | Size | Texture | Class

5 of 12

Preparing Sample Data

Steps:

  • Define the problem
  • Select relevant attributes
  • Assign class labels
  • Store data in CSV or ARFF format

6 of 12

Choosing a Simple Classifier

Common simple classifiers:

  • Naive Bayes
  • J48 (Decision Tree)
  • k-Nearest Neighbors (k-NN)

These are easy to understand and suitable for beginners.

7 of 12

Loading Data into WEKA

Steps:

  • Open WEKA GUI Chooser
  • Select Explorer
  • Open Preprocess tab
  • Load sample dataset
  • Set the class attribute

8 of 12

Building the Classification Model

In WEKA:

  • Go to Classify tab
  • Choose a classifier
  • Select test method
    • Cross-validation
    • Percentage split
  • Click Start

9 of 12

Model Training and Testing

  • WEKA automatically splits data
  • The model is trained on sample data
  • Testing evaluates how well the model generalizes

Evaluating Model Performance

WEKA provides: Accuracy, Precision, RecallF-measure, ROC Area

10 of 12

Confusion Matrix

A confusion matrix shows:

  • Correct predictions
  • Incorrect predictions

It helps analyze classification errors.

Example:

Accuracy: 92%

High recall → fewer false negatives

Balanced precision and recall → reliable model

11 of 12

Improving the Simple Model

Possible improvements:

  • Feature selection
  • Data normalization
  • Trying another classifier
  • Increasing training data

Advantages of Simple Models

Easy to understand, Fast training, Good baseline performance, Suitable for educational purpose

12 of 12

Summary

  • Sample data is the foundation of classification
  • Simple models help understand ML concepts
  • WEKA makes model development easy and visual
  • Proper evaluation ensures reliable results