1 of 50

Hands-On Computer Vision

ICTP-UNU Workshop on TinyML

for Sustainable Development

2 of 50

1

2

Step1:

* Preparation

* Data Collection

Step2:�* Training Design Model

Main Content

3

Step3:�* Test & Deploy

Your model

Introduction to Machine Vision

0

3 of 50

Main Content

Introduction to Machine Vision

0

4 of 50

Introduction to Machine Vision

5 of 50

Introduction to Machine Vision

6 of 50

Introduction to Machine Vision

7 of 50

Introduction to Machine Vision

8 of 50

Introduction to Machine Vision

9 of 50

Introduction to Machine Vision

A greyscale image can be understood as consisting of dark (0) and light (numbers greater than zero), the closer to 255 means the brighter the area is.

10 of 50

Introduction to Machine Vision

11 of 50

Introduction to Machine Vision

Shape 3D Array (480*540*3)

Height

Width

Colour Channel

12 of 50

Introduction to Machine Vision

13 of 50

Introduction to Machine Vision

14 of 50

Introduction to Machine Vision

15 of 50

Introduction to Machine Vision

16 of 50

Introduction to Machine Vision

Convolutional Layers

  • Convolutional layers use a set of learnable filters (also known as convolutional kernels) that slide over the input image and perform convolution operations.

  • Each filter extracts specific features from the image, such as edges, textures, or shapes.

  • The result of the convolution operation is a set of feature maps that represent the response of different features at various locations in the image.

  • By using multiple filters, convolutional layers can extract a variety of features from the image.

17 of 50

Introduction to Machine Vision

Pooling Layers

  • Pooling layers are used to reduce the spatial dimensions of the feature maps while retaining the most important feature information.

  • The most common pooling operation is max pooling, which selects the maximum value within each region of the feature map as the representative of that region.

  • Pooling operations help to reduce the size of the feature maps, thereby reducing the computational burden of subsequent layers and providing some level of translation invariance.

18 of 50

Introduction to Machine Vision

Fully Connected Layers

  • After multiple convolutional and pooling layers, CNNs typically employ one or more fully connected layers for the final classification or prediction task.

  • Fully connected layers flatten the features extracted by the previous layers and apply a weight matrix to transform them into the final output.

  • Fully connected layers can learn complex relationships between features and generate the desired output based on the task requirements, such as the probability of an image belonging to a specific class.

19 of 50

Image Classification VS Object Detection

Image Classification

  • Purpose: To categorize an entire image into a single class or label.
  • Output: A single label or class indicating what the main object in the image is.
  • How it works: The algorithm processes the whole image and assigns it to one of the predefined categories.
  • Use Cases:
    • Identifying whether an image contains a cat or a dog.
    • Classifying handwritten digits.
    • Detecting the presence of certain types of tumors in medical images.
  • Examples:
    • Convolutional Neural Networks (CNNs) used for classifying handwritten digits in the MNIST dataset.
    • ImageNet competition models that classify images into 1000 different categories.

20 of 50

Image Classification VS Object Detection

Object Detection

  • Purpose: To identify and locate multiple objects within an image.
  • Output: Multiple bounding boxes around detected objects, each with a corresponding label and confidence score.
  • How it works: The algorithm not only classifies objects but also determines their positions in the image by drawing bounding boxes around them.
  • Use Cases:
    • Autonomous driving (detecting pedestrians, cars, traffic signs, etc.).
    • Surveillance systems (detecting suspicious activities or objects).
    • Retail (monitoring stock levels on shelves).
  • Examples:
    • YOLO (You Only Look Once) for real-time object detection.
    • Faster R-CNN for detecting objects with high accuracy.
    • SSD (Single Shot MultiBox Detector) for fast object detection.

21 of 50

Image Classification VS Object Detection

Key Differences

  • Scope: Image Classification deals with the whole image, whereas Object Detection focuses on identifying and locating multiple objects within the image.
  • Complexity: Object Detection is generally more complex due to the additional task of locating objects.
  • Output: Image Classification gives a single label per image, while Object Detection provides multiple labels and bounding boxes.

Summary

  • Image Classification is simpler and is used when the goal is to assign a single category to an entire image.
  • Object Detection is more advanced and is used when the goal is to identify and locate multiple objects within an image.

22 of 50

Main Content

1

Step1:

* Preparation

* Data Collection

Introduction to Machine Vision

0

23 of 50

Step 1.1.1 Before Preparation

  • Hardware:
    • Seeed Studio XIAO ESP32S3 Sense x1

  • Software:
    • Arduino IDE.
    • Seeed XIAO ESP32S3 Sense library
    • Edge Impulse Studio

24 of 50

Step 1.1.2 Environment Preparation

  1. Visit the official Arduino website: https://www.arduino.cc/en/software

  • Click on the "Windows" or "Mac" button based on your operating system.

  • Download the Arduino IDE 1.8.19 installer.

  • Once the download is complete, run the installer.

  • Follow the installation wizard, accepting the license agreement and choosing the installation directory.

  • If prompted, allow the installer to install device drivers.

  • Once the installation is finished, click "Close" to exit the installer.

  • Open the Arduino IDE from the desktop shortcut or the start menu.

  • You're now ready to start using Arduino IDE 1.8.19!

25 of 50

1. Open the Arduino IDE.

2. Go to File > Preferences.

3. In the "Additional Boards Manager URLs" field, enter the following URL:

https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_index.json

4. Click "OK" to close the Preferences window.

Step 1.1.3 Install XIAO ESP32S3 on Arduino IDE

26 of 50

5. Navigate to Tools > Board > Boards Manager.

6. In the Boards Manager window, search for "ESP32".

7. Locate the "ESP32 by Espressif Systems" entry and click on it.

8. Select the latest version from the dropdown menu and click "Install".

9. Wait for the installation process to complete. This may take a few minutes.

10. Once the installation is finished, close the Boards Manager window.

Step 1.1.3 Install XIAO ESP32S3 on Arduino IDE

27 of 50

  1. Open the Arduino IDE and select the XIAO_ESP32S3 board (and the port where it is connected). On File > Examples > ESP32 > Camera, select CameraWebServer.

2. Define the XIAO model pins:

#define CAMERA_MODEL_XIAO_ESP32S3 // Has PSRAM

3. And on Tools, enable the PSRAM. Enter your WiFi credentials and upload the code to the device:

4 . Copy the address on the Serial Monitor.

Step 1.2.1 Open the dataset collection progress

28 of 50

5. Open the webpage and click “start stream”

6. Save the photo with what you want to detect.

*********

We suggest around 50 images mixing the objects and varying the number of each appearing on the scene. Try to capture different angles, backgrounds, and light conditions.

The stored images use a QVGA frame size of 320x240 and RGB565 (color pixel format).

Step 1.2.2 Dataset Collection via XIAO ESP32S3

29 of 50

Main Content

1

2

Step1:

* Preparation

* Data Collection

Step2:�* Training Design Model

Introduction to Machine Vision

0

30 of 50

Step 2.1 Setup the Edge Impulse Project

  1. Edge Impulse Website

https://edgeimpulse.com/

2. Create a new account then login.

3. Create a new project then name it.

“XIAO-ESP32S3-Sense-Object_Detection”

31 of 50

Step 2.1 Setup the Platform Dashboard

On your Project Dashboard, go down and on Project info

Select bounding boxes (object detection)

Then select Espressif ESP-EYE (most similar to our board) as your Target Device.

32 of 50

Step 2.2 Uploading the unlabeled data

1. On Studio, go to the Data Acquisition tab, and upload files captured as a folder from your computer on the UPLOAD DATA section.

2. All the not-labeled images (47) were uploaded but must be labeled appropriately before being used as a project dataset.

33 of 50

Step 2.3 Labeling the Dataset

  1. Use your mouse to drag a box around an object to add a label.

  • Then click Save labels to advance to the next item.

  • Continue with this process until the queue is empty.

  • At the end, all images should have the objects labeled as those samples below.

  • Review the labeled samples on the Data acquisition tab. If one of the labels is wrong, you can edit it using the three dots menu after the sample name.

34 of 50

Step 2.4 The Impulse Design Setup

1. Pre-processing consists of resizing the individual images from 320 x 240 to 96 x 96 and squashing them (squared form, without cropping). Afterward, the images are converted from RGB to Grayscale.

2. Design a Model, in this case, “Object Detection.”

35 of 50

Step 2.5 Preprocessing all dataset

1. In this section, select Color depth as Grayscale, suitable for use with FOMO models and Save parameters.

36 of 50

Step 2.6 Model Design and Training

1. Use FOMO, an object detection model based on MobileNetV2 (alpha 0.35) designed to coarsely segment an image into a grid of background vs objects of interest (here, boxes and wheels).

2. Regarding the training hyper-parameters, the model will be trained with:

Epochs: 60

Batch size: 32

Learning Rate: 0.001.

37 of 50

About FOMO

FOMO is an innovative machine learning model for object detection that uses up to 30 times less energy and memory than traditional models like Mobilenet SSD and YOLOv5.

It can operate on microcontrollers with less than 200 KB of RAM by focusing on object location rather than size.

How FOMO Works?

  1. Grayscale Image Input: FOMO converts the image to grayscale.
  2. Pixel Blocks: The image is divided into blocks using a factor of 8 (e.g., a 96x96 image becomes a 12x12 grid).
  3. Probability Calculation: A classifier runs through each block to calculate the probability of containing an object, classifying blocks without objects as background.
  4. Centroid Coordinates: From the regions with the highest probabilities, FOMO determines the centroid coordinates of the objects.

38 of 50

Main Content

1

2

Step1:

* Preparation

* Data Collection

Step2:�* Training Design Model

3

Step3:�* Test & Deploy

Your model

Introduction to Machine Vision

0

39 of 50

Step 3.1 Test your Model

1. Once our model is trained, we can test it using the Live Classification tool. On the correspondent section, click on the Connect a development board icon (a small MCU) and scan the QR code with your phone.

2. Once connected, you can use the smartphone to capture actual images to be tested by the trained model on Edge Impulse Studio.

3. One thing to be noted is that the model can produce false positives and negatives. This can be minimized by defining a proper Confidence Threshold (use the Three dots menu for the setup). Try with 0.8 or more.

40 of 50

Step 3.1 Output the Model

  1. Select the Arduino Library and Quantized (int8) model, enable the EON Compiler on the Deploy Tab, and press [Build].

41 of 50

Step 3.2 Import model into Arduino IDE

  1. Open your Arduino IDE, and under Sketch, go to Include Library and add.ZIP Library. Select the file you download from Edge Impulse Studio.

  • Under the Examples tab on Arduino IDE, you should find a sketch code (esp32 > esp32_camera) under your project name.

42 of 50

Step 3.4 Change the setting on XIAO ESP32S3

Setting the XIAO ESP32 Pin set.

change lines 32 to 75, which define the camera model and pins, using the data related to our model. Copy and paste the below lines, replacing the lines 32-75:

#define PWDN_GPIO_NUM -1

#define RESET_GPIO_NUM -1

#define XCLK_GPIO_NUM 10

#define SIOD_GPIO_NUM 40

#define SIOC_GPIO_NUM 39

#define Y9_GPIO_NUM 48

#define Y8_GPIO_NUM 11

#define Y7_GPIO_NUM 12

#define Y6_GPIO_NUM 14

#define Y5_GPIO_NUM 16

#define Y4_GPIO_NUM 18

#define Y3_GPIO_NUM 17

#define Y2_GPIO_NUM 15

#define VSYNC_GPIO_NUM 38

#define HREF_GPIO_NUM 47

#define PCLK_GPIO_NUM 13

43 of 50

Step 3.5 Deploy the Model on XIAO ESP32S3

44 of 50

Thanks you!

45 of 50

Let’s see who achieve first!

46 of 50

Congratulations!

47 of 50

Co-Create Gadget

48 of 50

Co-Invent Solutions

Based on various digital transformation scenarios, we continue to develop smart devices that integrate the latest technologies, and work closely with developers and industry experts to provide software and hardware solutions for vertical industries at multiple levels.

IoT Devices

Solution

Industrial Know-How

Smart Agriculture

Smart Greenhouse

Energy Monitoring

Carbon Monitoring

Smart

City

Smart

Traffic

Software, Algothrim, Industrial Insights

Non-engineer Experts

Embodied AI as a bridge between science and application

49 of 50

50 of 50

Make Profit From Your Ideas with Co-Create