1 of 50

Hands-On Computer Vision

ICTP-UNU Workshop on TinyML

for Sustainable Development

Good afternoon, everyone! My name is Citric, and I am very happy to be here with you today as the workshop teacher. I work as an Application Engineer at Seeed Studio, where I have the chance to explore and develop new solutions in the field of technology.

Today, I am excited to share with you what I know about the interesting world of Machine Vision and TinyML. Machine Vision is a quickly growing field that combines computer vision, artificial intelligence, and embedded systems to help machines see and understand the world around them. TinyML, on the other hand, brings the power of machine learning to small devices, allowing us to use smart algorithms（aio鬼森） directly on the edge.

During my time at Seeed Studio, I have had the opportunity to work on many projects involving Machine Vision and TinyML. I have developed applications that use these technologies to solve real-world problems, from finding and recognizing objects to controlling things with gestures and finding unusual things. My experience has shown me how much potential these technologies have in different areas, including factories, smart cities, healthcare, and more.

In today‘s workshop, I want to share my knowledge with you, giving you a full overview of Machine Vision and TinyML. We will explore the basic ideas, tools, and methods（meafei） used in these fields, and I will guide you through hands-on exercises and practical examples. Whether you are a beginner looking to get started with Machine Vision and TinyML or someone with experience looking to learn more, this workshop will give you the knowledge and tools you need to start your own projects.

I am very happy to have the chance to talk with such a diverse（die vers） and excited group of people. Your participation and involvement are important to making this workshop a success. I encourage you to ask questions, share your ideas, and work together with the other attendees throughout the session.

So, let's jump in and discover the potential of Machine Vision and TinyML together! I am sure that by the end of this workshop, you will have gained valuable insights and practical skills that you can use in your own projects and innovations.

Thank you for listening, and let's get started on this exciting journey!

2 of 50

1

2

Step1:

* Preparation

* Data Collection

Step2:�* Training Design Model

Main Content

3

Step3:�* Test & Deploy

Your model

Introduction to Machine Vision

0

Today, I plan to take you on a journey to experience the charm of Machine Vision from four different angles（en狗）.

First, I want to start by introducing the basic ideas of Machine Vision. We will explore how computers see and understand images, giving you a good foundation in the main concepts. By learning about the basic ways Machine Vision works, you'll be better prepared to appreciate its potential and use it well in your own projects.

Second, we will set up the Arduino programming environment. This part of the workshop may take a bit longer compared to the others, so we will do it in the second section while we wait for the setup to finish.

While the Arduino environment is being set up, we will introduce you to two amazing products from Seeed Studio that we'll be using throughout today's workshop: the Grove Vision AI Module (2nd Generation) and the XIAO Development Board. These powerful tools will help you quickly get hands-on experience with Machine Vision, even if you're new to the field. We'll explore what they can do and how they work together to create cool Machine Vision applications.

Lastly, in the final part of the workshop, we will guide you through a practical project where you‘ll have the chance to put what you’ve learned into action. By programming and using the Grove Vision AI Module and XIAO Development Board, you‘ll see firsthand what Machine Vision can do. This hands-on experience will not only help you understand better but also spark your imagination and creativity. We encourage you to think beyond the examples we give and create your own unique（u nike） projects that show the power of Machine Vision.

Throughout the workshop, our goal is to give you the skills and inspiration to use the potential of Machine Vision in your own projects. Whether you‘re an experienced developer looking to learn more or a curious beginner eager（一哥） to explore this fascinating field, this workshop will provide you with the tools and knowledge to start your Machine Vision journey.

So, let's dive in and discover the wonders of Machine Vision together! Get ready to be amazed by the possibilities that lie ahead, and prepare to create projects that push the limits of what's possible with TinyML and Machine Vision.

3 of 50

Main Content

Introduction to Machine Vision

0

4 of 50

Introduction to Machine Vision

Billions of years ago, life on Earth was very simple. At that time, some small living things developed a change that allowed them to sense light. This change was the start of the development of vision.

In the beginning, these light-sensing cells could only tell if there was light or not. But over time, they slowly evolved to have more complex abilities. They began to be able to tell the brightness and direction of light, which allowed living things to better understand their surroundings.

Throughout the long process of evolution, light-sensing cells kept developing and specializing, eventually forming the retina as we know it today.

As the visual system kept evolving（e 喔ving）, the parts of the brain that handle visual information also became more complex. Different areas of the brain began to specialize in processing different parts of visual information, such as shape, motion, color, and depth. This specialization allowed us to quickly recognize objects, judge distances, and react to the world around us.

5 of 50

Introduction to Machine Vision

6 of 50

Introduction to Machine Vision

7 of 50

Introduction to Machine Vision

8 of 50

Introduction to Machine Vision

Recalling what we have just said, interestingly, the devices and principles of machine vision are actually built in a way that copies the human visual system. Just like how our eyes capture light, cameras and sensors in machine vision systems are designed to capture visual information from the environment. These devices, such as digital cameras, depth sensors, and infrared cameras, act as the "eyes" of the machine, turning light into digital signals that can be processed by computers.

Similar to how our brain has receptors for accessing visual information, machine vision systems have special hardware and software parts that receive and interpret the captured visual data. These parts include image processors, computer vision algorithms, and deep learning models that analyze and extract meaningful features from the raw visual input.

So looking more closely, what is the nature of a computer's ability to understand an image?

9 of 50

Introduction to Machine Vision

A greyscale image can be understood as consisting of dark (0) and light (numbers greater than zero), the closer to 255 means the brighter the area is.

Have you ever wondered how computers see the world around them? It‘s quite interesting, really! You see, computers have a rather special way of understanding things. For them, it’s all about zeros and ones – the binary language that makes up the very heart of their existence.（ex system） It's like they're always stuck in a game of "Would You Rather," except their choices are limited to just two options!

So, how do computers deal with images in this black and white world of theirs? Well, let me show you a little magic trick. This, my friends, is how an image is shown in the computer's mind. Each pixel in this 28x28 grid is given a value, ranging from 0 to 255. A value of 0 means that pixel is completely black, while higher numbers show increasing levels of brightness. It's like the computer is painting by numbers, except instead of creating a beautiful landscape, it's creating a masterpiece of binary art!

Imagine the computer as a careful accountant, recording the brightness value of each pixel in a big table. The computer carefully fills in each cell, making sure every pixel is counted for. And Wow! With this numerical representation, the computer can now analyze, and process the image.

10 of 50

Introduction to Machine Vision

11 of 50

Introduction to Machine Vision

Shape 3D Array (480*540*3)

Height

Width

Colour Channel

12 of 50

Introduction to Machine Vision

13 of 50

Introduction to Machine Vision

14 of 50

Introduction to Machine Vision

15 of 50

Introduction to Machine Vision

16 of 50

Introduction to Machine Vision

Convolutional Layers

Convolutional layers use a set of learnable filters (also known as convolutional kernels) that slide over the input image and perform convolution operations.

Each filter extracts specific features from the image, such as edges, textures, or shapes.

The result of the convolution operation is a set of feature maps that represent the response of different features at various locations in the image.

By using multiple filters, convolutional layers can extract a variety of features from the image.

17 of 50

Introduction to Machine Vision

Pooling Layers

Pooling layers are used to reduce the spatial dimensions of the feature maps while retaining the most important feature information.

The most common pooling operation is max pooling, which selects the maximum value within each region of the feature map as the representative of that region.

Pooling operations help to reduce the size of the feature maps, thereby reducing the computational burden of subsequent layers and providing some level of translation invariance.

18 of 50

Introduction to Machine Vision

Fully Connected Layers

After multiple convolutional and pooling layers, CNNs typically employ one or more fully connected layers for the final classification or prediction task.

Fully connected layers flatten the features extracted by the previous layers and apply a weight matrix to transform them into the final output.

Fully connected layers can learn complex relationships between features and generate the desired output based on the task requirements, such as the probability of an image belonging to a specific class.

At first, the CNN doesn't know how to recognize a cat. It needs to learn by studying a lot of cat pictures and constantly adjusting how it judges.

You can show the CNN many cat pictures and tell it that they are all cats. The CNN will compare its own judgment with the answers you give and keep adjusting until it can correctly recognize cats.

This learning process is like how you learn to recognize cats. By constantly practicing and fixing mistakes, you eventually master the ability to identify cats.

The above is actually a very simple overview of the main ideas of what we often call machine vision and the CNN algorithms that are commonly used in the machine vision field.

If you don't understand it, that's okay. We just need to know that images are stored as numbers in the computer and all machine learning is based on these numbers.

It's all just to make things easier for us, right?

19 of 50

Image Classification VS Object Detection

Image Classification

Purpose: To categorize an entire image into a single class or label.
Output: A single label or class indicating what the main object in the image is.
How it works: The algorithm processes the whole image and assigns it to one of the predefined categories.
Use Cases:

Identifying whether an image contains a cat or a dog.
Classifying handwritten digits.
Detecting the presence of certain types of tumors in medical images.

Examples:

Convolutional Neural Networks (CNNs) used for classifying handwritten digits in the MNIST dataset.
ImageNet competition models that classify images into 1000 different categories.

At first, the CNN doesn't know how to recognize a cat. It needs to learn by studying a lot of cat pictures and constantly adjusting how it judges.

You can show the CNN many cat pictures and tell it that they are all cats. The CNN will compare its own judgment with the answers you give and keep adjusting until it can correctly recognize cats.

This learning process is like how you learn to recognize cats. By constantly practicing and fixing mistakes, you eventually master the ability to identify cats.

The above is actually a very simple overview of the main ideas of what we often call machine vision and the CNN algorithms that are commonly used in the machine vision field.

If you don't understand it, that's okay. We just need to know that images are stored as numbers in the computer and all machine learning is based on these numbers.

It's all just to make things easier for us, right?

20 of 50

Image Classification VS Object Detection

Object Detection

Purpose: To identify and locate multiple objects within an image.
Output: Multiple bounding boxes around detected objects, each with a corresponding label and confidence score.
How it works: The algorithm not only classifies objects but also determines their positions in the image by drawing bounding boxes around them.
Use Cases:

Autonomous driving (detecting pedestrians, cars, traffic signs, etc.).
Surveillance systems (detecting suspicious activities or objects).
Retail (monitoring stock levels on shelves).

Examples:

YOLO (You Only Look Once) for real-time object detection.
Faster R-CNN for detecting objects with high accuracy.
SSD (Single Shot MultiBox Detector) for fast object detection.

At first, the CNN doesn't know how to recognize a cat. It needs to learn by studying a lot of cat pictures and constantly adjusting how it judges.

You can show the CNN many cat pictures and tell it that they are all cats. The CNN will compare its own judgment with the answers you give and keep adjusting until it can correctly recognize cats.

This learning process is like how you learn to recognize cats. By constantly practicing and fixing mistakes, you eventually master the ability to identify cats.

The above is actually a very simple overview of the main ideas of what we often call machine vision and the CNN algorithms that are commonly used in the machine vision field.

If you don't understand it, that's okay. We just need to know that images are stored as numbers in the computer and all machine learning is based on these numbers.

It's all just to make things easier for us, right?

21 of 50

Image Classification VS Object Detection

Key Differences

Scope: Image Classification deals with the whole image, whereas Object Detection focuses on identifying and locating multiple objects within the image.
Complexity: Object Detection is generally more complex due to the additional task of locating objects.
Output: Image Classification gives a single label per image, while Object Detection provides multiple labels and bounding boxes.

Summary

Image Classification is simpler and is used when the goal is to assign a single category to an entire image.
Object Detection is more advanced and is used when the goal is to identify and locate multiple objects within an image.

At first, the CNN doesn't know how to recognize a cat. It needs to learn by studying a lot of cat pictures and constantly adjusting how it judges.

You can show the CNN many cat pictures and tell it that they are all cats. The CNN will compare its own judgment with the answers you give and keep adjusting until it can correctly recognize cats.

This learning process is like how you learn to recognize cats. By constantly practicing and fixing mistakes, you eventually master the ability to identify cats.

The above is actually a very simple overview of the main ideas of what we often call machine vision and the CNN algorithms that are commonly used in the machine vision field.

If you don't understand it, that's okay. We just need to know that images are stored as numbers in the computer and all machine learning is based on these numbers.

It's all just to make things easier for us, right?

22 of 50

Main Content

1

Step1:

* Preparation

* Data Collection

Introduction to Machine Vision

0

23 of 50

Step 1.1.1 Before Preparation

https://www.arduino.cc/

Hardware:

Seeed Studio XIAO ESP32S3 Sense x1

Software:

Arduino IDE.
Seeed XIAO ESP32S3 Sense library
Edge Impulse Studio

24 of 50

Step 1.1.2 Environment Preparation

Visit the official Arduino website: https://www.arduino.cc/en/software

Click on the "Windows" or "Mac" button based on your operating system.

Download the Arduino IDE 1.8.19 installer.

Once the download is complete, run the installer.

Follow the installation wizard, accepting the license agreement and choosing the installation directory.

If prompted, allow the installer to install device drivers.

Once the installation is finished, click "Close" to exit the installer.

Open the Arduino IDE from the desktop shortcut or the start menu.

You're now ready to start using Arduino IDE 1.8.19!

25 of 50

1. Open the Arduino IDE.

2. Go to File > Preferences.

3. In the "Additional Boards Manager URLs" field, enter the following URL:

https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_index.json

4. Click "OK" to close the Preferences window.

Step 1.1.3 Install XIAO ESP32S3 on Arduino IDE

26 of 50

5. Navigate to Tools > Board > Boards Manager.

6. In the Boards Manager window, search for "ESP32".

7. Locate the "ESP32 by Espressif Systems" entry and click on it.

8. Select the latest version from the dropdown menu and click "Install".

9. Wait for the installation process to complete. This may take a few minutes.

10. Once the installation is finished, close the Boards Manager window.

Step 1.1.3 Install XIAO ESP32S3 on Arduino IDE

27 of 50

Open the Arduino IDE and select the XIAO_ESP32S3 board (and the port where it is connected). On File > Examples > ESP32 > Camera, select CameraWebServer.

2. Define the XIAO model pins:

#define CAMERA_MODEL_XIAO_ESP32S3 // Has PSRAM

3. And on Tools, enable the PSRAM. Enter your WiFi credentials and upload the code to the device:

4 . Copy the address on the Serial Monitor.

Step 1.2.1 Open the dataset collection progress

28 of 50

5. Open the webpage and click “start stream”

6. Save the photo with what you want to detect.

*********

We suggest around 50 images mixing the objects and varying the number of each appearing on the scene. Try to capture different angles, backgrounds, and light conditions.

The stored images use a QVGA frame size of 320x240 and RGB565 (color pixel format).

Step 1.2.2 Dataset Collection via XIAO ESP32S3

29 of 50

Main Content

1

2

Step1:

* Preparation

* Data Collection

Step2:�* Training Design Model

Introduction to Machine Vision

0

30 of 50

Step 2.1 Setup the Edge Impulse Project

Edge Impulse Website

https://edgeimpulse.com/

2. Create a new account then login.

3. Create a new project then name it.

“XIAO-ESP32S3-Sense-Object_Detection”

31 of 50

Step 2.1 Setup the Platform Dashboard

On your Project Dashboard, go down and on Project info

Select bounding boxes (object detection)

Then select Espressif ESP-EYE (most similar to our board) as your Target Device.

32 of 50

Step 2.2 Uploading the unlabeled data

1. On Studio, go to the Data Acquisition tab, and upload files captured as a folder from your computer on the UPLOAD DATA section.

2. All the not-labeled images (47) were uploaded but must be labeled appropriately before being used as a project dataset.

33 of 50

Step 2.3 Labeling the Dataset

Use your mouse to drag a box around an object to add a label.

Then click Save labels to advance to the next item.

Continue with this process until the queue is empty.

At the end, all images should have the objects labeled as those samples below.

Review the labeled samples on the Data acquisition tab. If one of the labels is wrong, you can edit it using the three dots menu after the sample name.

34 of 50

Step 2.4 The Impulse Design Setup

1. Pre-processing consists of resizing the individual images from 320 x 240 to 96 x 96 and squashing them (squared form, without cropping). Afterward, the images are converted from RGB to Grayscale.

2. Design a Model, in this case, “Object Detection.”

35 of 50

Step 2.5 Preprocessing all dataset

1. In this section, select Color depth as Grayscale, suitable for use with FOMO models and Save parameters.

36 of 50

Step 2.6 Model Design and Training

1. Use FOMO, an object detection model based on MobileNetV2 (alpha 0.35) designed to coarsely segment an image into a grid of background vs objects of interest (here, boxes and wheels).

2. Regarding the training hyper-parameters, the model will be trained with:

Epochs: 60

Batch size: 32

Learning Rate: 0.001.

37 of 50

About FOMO

FOMO is an innovative machine learning model for object detection that uses up to 30 times less energy and memory than traditional models like Mobilenet SSD and YOLOv5.

It can operate on microcontrollers with less than 200 KB of RAM by focusing on object location rather than size.

How FOMO Works？

Grayscale Image Input: FOMO converts the image to grayscale.
Pixel Blocks: The image is divided into blocks using a factor of 8 (e.g., a 96x96 image becomes a 12x12 grid).
Probability Calculation: A classifier runs through each block to calculate the probability of containing an object, classifying blocks without objects as background.
Centroid Coordinates: From the regions with the highest probabilities, FOMO determines the centroid coordinates of the objects.

38 of 50

Main Content

1

2

Step1:

* Preparation

* Data Collection

Step2:�* Training Design Model

3

Step3:�* Test & Deploy

Your model

Introduction to Machine Vision

0

39 of 50

Step 3.1 Test your Model

1. Once our model is trained, we can test it using the Live Classification tool. On the correspondent section, click on the Connect a development board icon (a small MCU) and scan the QR code with your phone.

2. Once connected, you can use the smartphone to capture actual images to be tested by the trained model on Edge Impulse Studio.

3. One thing to be noted is that the model can produce false positives and negatives. This can be minimized by defining a proper Confidence Threshold (use the Three dots menu for the setup). Try with 0.8 or more.

40 of 50

Step 3.1 Output the Model

Select the Arduino Library and Quantized (int8) model, enable the EON Compiler on the Deploy Tab, and press [Build].

41 of 50

Step 3.2 Import model into Arduino IDE

Open your Arduino IDE, and under Sketch, go to Include Library and add.ZIP Library. Select the file you download from Edge Impulse Studio.

Under the Examples tab on Arduino IDE, you should find a sketch code (esp32 > esp32_camera) under your project name.

42 of 50

Step 3.4 Change the setting on XIAO ESP32S3

Setting the XIAO ESP32 Pin set.

change lines 32 to 75, which define the camera model and pins, using the data related to our model. Copy and paste the below lines, replacing the lines 32-75:

#define PWDN_GPIO_NUM -1

#define RESET_GPIO_NUM -1

#define XCLK_GPIO_NUM 10

#define SIOD_GPIO_NUM 40

#define SIOC_GPIO_NUM 39

#define Y9_GPIO_NUM 48

#define Y8_GPIO_NUM 11

#define Y7_GPIO_NUM 12

#define Y6_GPIO_NUM 14

#define Y5_GPIO_NUM 16

#define Y4_GPIO_NUM 18

#define Y3_GPIO_NUM 17

#define Y2_GPIO_NUM 15

#define VSYNC_GPIO_NUM 38

#define HREF_GPIO_NUM 47

#define PCLK_GPIO_NUM 13

43 of 50

Step 3.5 Deploy the Model on XIAO ESP32S3

44 of 50

Thanks you!

45 of 50

Let’s see who achieve first!

46 of 50

Congratulations!

47 of 50

Co-Create Gadget

48 of 50

Co-Invent Solutions

Based on various digital transformation scenarios, we continue to develop smart devices that integrate the latest technologies, and work closely with developers and industry experts to provide software and hardware solutions for vertical industries at multiple levels.

IoT Devices

Solution

Industrial Know-How

Smart Agriculture

Smart Greenhouse

Energy Monitoring

Carbon Monitoring

Smart

City

Smart

Traffic

Software, Algothrim, Industrial Insights

Non-engineer Experts

Embodied AI as a bridge between science and application

49 of 50

50 of 50

Make Profit From Your Ideas with Co-Create