1 of 17

Workshop DATAI-IESE

May 6^th, 2024

Diego Borro

Computer Science PhD

Vision and Robotics research line at CEIT

Research Professor at TECNUN

dborro@ceit.es

AI techniques for image processing

2 of 17

Index

Context
Types of problems and techniques in image processing
Techniques for unveiling the black box: XAI and model calibration
Projects examples and conclusions

3 of 17

3

Context

Deep Learning

Convolutional Neural Networks

Fully Connected Neural Networks

Basically, focus on DL (Deep Learning)

FC: Fully Connected Neural Networks
CNN: Convolutional Neural Networks

Inteligencia Artificial: Programa de computación diseñado para realizar determinadas operaciones que se consideran propias de la inteligencia humana, como el autoaprendizaje en la percepción visual, el reconocimiento del habla, la toma de decisiones y la traducción entre idiomas entre otras (disciplina)

Machine Learning: rama de la inteligencia artificial, cuyo objetivo es desarrollar técnicas, usando datos y algoritmos, que permitan que las computadoras aprendan sin instrucciones directas (técnicas). Machine learning is a branch of computer science that uses statistical techniques to give computer systems the ability to learn from data, find patterns in data and make highly accurate predictions. In other words, it's a set of techniques used to teach computers how to learn, reason, perceive, distinguish, infer, communicate and make decisions like humans do. It is a large part of modern artificial intelligence (AI). Here, data encompasses many things, such as words, numbers, images, videos and more.

Deep Learning: algoritmos inspirados en la estructura y función del cerebro humano utilizando grandes volúmenes de datos y algoritmos complejos (técnicas)

4 of 17

4

Image processing techniques

What we can do:

Deep knowledge of SoA architectures (CNN, ViT, FCN, GAN, NeRF,…)
Transfer learning
Last layers design
Tuning of hyperparameters and/or architectures modification

For which type of problems:

Image classification
Objects detection
Image segmentation

Problem	Image information	Complexity	Labeling
Classification	Low	Easy	Easy
Detection	Medium	Medium	Medium
Segmentation	High	Hard	Hard

Types of problems and techniques in image processing

5 of 17

5

Classification problems

A ConvNet is able to successfully capture the Spatial and Temporal dependencies in an image through the application of relevant filters

Feature extractor: it’s the backbone. It can be used in any image processing problem
Classifier: it is a fully connected network (FC) designed by the developer

Several neuron deep layers
Last layer = number of categories

Goal: given an image, classify the whole image among a set of categories

Types of problems and techniques in image processing

6 of 17

6

Classification problems

Initial dataset (ImageNet,…)

CNN pretrained

Weights optimization

Labeled data

Application dataset

CNN trained with own data

Weights optimization

Labeled data

Test dataset

Inference

Confussion matrix

Metrics computation

Initial model weights

Final model weights

Final layers design

Transfer learning

Network modification

Hyperparameters tuning

Types of problems and techniques in image processing

7 of 17

7

Detection problems

Architectures much more complex (several CNNs + RPN + classifier and regressors)

Regressors: to compute the 4 bounding box coordinates
Classifier: to classify each bounding box

Goal: given an image, detect and classify different objects

Types of problems and techniques in image processing

8 of 17

8

Segmentation problems

The Fully Convolutional Networks (FCNs) are a variant of the standard CNNs. They don’t have FC layers!!
They are used for segmenting any type of object or detect interest regions in images

Convolution path (feature extractor): it’s the backbone. It can be used in any image processing problem
Deconvolution path (image generator): the result is as many masks as there are categories

Goal: given an image, segment into masks (assign each pixel into a category)

Types of problems and techniques in image processing

9 of 17

9

Black box and Explainable AI (XAI)

Deep learning models = black box models

They are far more complex to interpret than most machine learning models (opaque nature and non-linear complexity)
“Perfect” matching input-output but no direct evidence how

XAI for a better understanding AI

Techniques for unveiling the black box: XAI and model calibration

10 of 17

10

Techniques for unveiling the black box: XAI and model calibration

eXplainable AI

Explainable Artificial Intelligence (XAI) is a concept that explains decisions made by machine learning models and provides justification in a way interpretable by humans [1]
XAI are tools to visualize and understand how a complex model is making decisions, which can help "explain" these decisions in more intuitive terms

[1] S. Ali, et al., “Explainable artificial intelligence (xai): What we know and what is left to attain trustworthy artificial intelligence,” Information Fusion, p. 101805, 2023.

[2] Ribeiro, MT., et al., “Why Should I Trust You?": Explaining the Predictions of Any Classifier”, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.

[3] SM Lundberg, SI Lee. “A unified approach to interpreting model predictions”, Advances in neural information processing systems, 2017.

In a FNN (fully-connected neural network), neurons learn representation and patterns that is difficult to extract and present in a human-readable form

LIME (Local Interpretable Model-Agnostic Explanations) [2]
SHAP (SHapley Additive exPlanations) [3]

They try to understand the importance of features by seeing how predictions change when input features are perturbed, removed of changed (Bias detection!!)

https://www.kdnuggets.com/2019/12/interpretability-part-3-lime-shap.html

Supongamos que tienes un modelo de aprendizaje automático que ha sido entrenado para predecir si un individuo obtendrá o no un préstamo. Este modelo utiliza varias características de entrada, como la edad, el género, el ingreso, la ocupación, etc.

Aquí es donde las técnicas de XAI pueden ser útiles. Podemos usar un método como SHAP (SHapley Additive exPlanations) para entender la importancia de cada característica en las predicciones del modelo.

Si encontramos que el género tiene una alta importancia en las predicciones del modelo, esto podría ser un indicador de sesgo de género. Es decir, el modelo podría estar tomando decisiones basadas en el género de un individuo en lugar de basarse únicamente en su capacidad para devolver el préstamo.

Podemos ver más de cerca examinando la distribución de los valores SHAP para el género. Si encontramos que los hombres tienden a tener valores SHAP más altos (lo que indica una mayor probabilidad de obtener un préstamo) en comparación con las mujeres, esto sería una señal de que existe un sesgo de género en las predicciones del modelo.

Además, las técnicas de XAI también pueden ayudarnos a entender si hay interacciones entre características que podrían estar contribuyendo al sesgo. Por ejemplo, si encontramos que la ocupación y el género interactúan en las predicciones del modelo, esto podría ser una señal de que el modelo está perpetuando estereotipos de género en ciertas ocupaciones.

11 of 17

11

eXplainable AI

Explainable Artificial Intelligence (XAI) is a concept that explains decisions made by machine learning models and provides justification in a way interpretable by humans
XAI are tools to visualize and understand how a complex model is making decisions, which can help "explain" these decisions in more intuitive terms

[4] M. Aouayeb, et. al. “Learning Vision Transformer with Squeeze and Excitation for Facial Expression Recognition”, ArXiv, 2021.

CNN are focused on image processing problems so, those patterns are images!!

Deeper representation in a CNN capture high-level abstracts or visual concepts

Convolutional layers naturally retain the spatial information of the input data

Shapes and patterns are detecting at successive layers

[4]

Techniques for unveiling the black box: XAI and model calibration

XAI Visualizing deep learning is useful during both the training and inference processes. During training, it can be used to visually explore and rigorously observe the model as it learns and track the model performance. The real-time update is useful for observing the metrics recomputed after every batch and epoch, such as loss, accuracy and training time. These metrics aid model developers with many insights.
Visualization after training has been used to discern feature visualization and attributions to understand what patterns are learned by the model filters and what features are attributed to the model decisions.

Imagina que tienes un modelo CNN que ha sido entrenado para realizar un análisis de sentimientos en imágenes faciales. El modelo puede predecir si una persona está feliz, triste, enfadada, etc., basándose en su expresión facial.

Podemos usar técnicas de XAI, como Grad-CAM (Gradient-weighted Class Activation Mapping), para visualizar qué partes de la imagen son las más influyentes para la predicción del modelo. Esto puede ayudar a entender qué características está utilizando el modelo para hacer sus predicciones.

Si encontramos que el modelo está prestando una atención desproporcionada a ciertos rasgos faciales que están correlacionados con la raza o el género de la persona, esto podría ser un indicador de sesgo. Por ejemplo, si el modelo tiende a predecir que las personas con ciertos rasgos faciales están "enfadadas" más a menudo que otras, esto podría indicar un sesgo racial en el modelo.

Además, podemos utilizar técnicas como la prueba de paridad demográfica para comparar las tasas de error del modelo en diferentes grupos demográficos. Si encontramos que el modelo tiene tasas de error significativamente más altas para un grupo demográfico en particular, esto podría ser un indicador de sesgo

12 of 17

12

Some XAI techniques for CNNs

Name	Focus	Eq
Layer visualization	Last convolutional layer
Saliency maps [5]	Impact in the output respected to input changes (pixels)
Grad-CAM [6]	Impact in the output respected to FM changes (high-level features)
Attention maps [7]	Image areas where the model pays attention
Guided Backpropagation [8]	Impact in the output respected to positive input changes
Integrated Gradients [9]	Impact in the output respected to changes in N inputs (pixels)

[5] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” arXiv preprint arXiv:1312.6034, 2013.

[6] R. R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, and D. Batra, “Grad-cam: Why did you say that?” Nov. 2016.

[7] Alexey Dosovitskiy y col. “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale”. En: CoRR abs/2010.11929 (2020). arXiv: 2010.11929. url: https://arxiv.org/ abs/2010.11929.

[8] J.T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, “Striving for simplicity: the all convolutional net”, Proceedings of the International Conference on Learning Representations (ICLR 2015).

[9] M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic Attribution for Deep Networks”, Proceedings of the 34th International Conference on Machine Learning (ICML’17), Vol. 70, pp. 3319-3328. August 2017.

Techniques for unveiling the black box: XAI and model calibration

The neurons in the last convolutional layer mostly observe the class-specific information in the image, such as object parts. Therefore, I use gradient descent to compute the gradient of the activation output of the last convolutional layer in CNN to understand the importance of each neuron for a class of interest. We use relevant heat map visualization to highlight the importance of each neuron (channel or filter) and its correlation to the prediction made by the model.

Building upon this assumption, we formally set up our hypothesis as follows: Apply image localization and object detection techniques to distinguish what part of an input image attributed most to the classification decision by the model. In other words, localization is a useful method to study which pixels in a given input have high activations or to what input neurons are most receptive and how that correlates to the class prediction.

Firstly, When an image is submitted, DeepViz predicts a class label and then shows why the predicted label is appropriate for the image using the relevant heat map. This method is useful to highlight which part of the image is most relevant to the classification decision. The second method is a feature activation graph that displays what features a neural network has learned and how the network builds its internal representation of features detected by the network during the inference process.

Secondly, DeepViz produces an activation graph visualizing the feature maps that are output by intermediate layers in the network. This graph visualization outlines how given input is decomposed into various individual filters learned by the network.

https://www.sciencedirect.com/science/article/pii/S1566253523001148

https://towardsdatascience.com/understand-your-algorithm-with-grad-cam-d3b62fce353

https://datascientest.com/es/que-es-el-metodo-grad-cam#:~:text=Grad%2DCAM%20consiste%20en%20buscar,una%20clase%20de%20salida%20espec%C3%ADfica.

https://arxiv.org/abs/1610.02391

https://keras.io/examples/vision/grad_cam/

https://pyimagesearch.com/2020/03/09/grad-cam-visualize-class-activation-maps-with-keras-tensorflow-and-deep-learning/

n términos matemáticos, un mapa de saliencia S para una imagen de entrada x y una red neuronal f se calcula como el gradiente de la salida de la red con respecto a la imagen de entrada. Esto se puede escribir como:

S = ∇x f(x)S=∇xf(x)Donde:

∇x es el operador de gradiente con respecto a x.

f(x) es la salida de la red neuronal para la imagen de entrada x.

Este cálculo da como resultado un mapa de saliencia que tiene la misma forma que la imagen de entrada, y que muestra cuánto cambiaría la salida de la red si cambiáramos cada píxel de la imagen de entrada.

https://neptune.ai/blog/visualizing-machine-learning-models

13 of 17

13

Which technique is better?

Different explanations often present different aspects of the model’s behavior
Lack of objectivity and quantification
Working in a methodology

Techniques for unveiling the black box: XAI and model calibration

Test dataset

Representation

CNN

Black-box model

XAI Explanations

Test ground truth masks

Convert to masks

IoU

Metrics evaluation

IoU

Comparison and test dataset results

14 of 17

14

Model calibration

A model is defined as calibrated when the confidence of occurrence of the class corresponds to the accuracy for that confidence [10]
With a miscalibrated model, high confidences could have lower accuracies or on the contrary, which would be untrusted and critical in a variety of applications
Techniques to solve it and calibrate models

[10] Guo, Chuan, Pleiss, Geoff, Sun, Yu,Weinberger, Kilian Q, (2017). On calibration of modern neural networks. International Conference on Machine Learning 1321-1330.

Techniques for unveiling the black box: XAI and model calibration

15 of 17

15

Projects examples

Projects examples and conclusions

16 of 17

16

Conclusions

Projects examples and conclusions

We work in any application where images are part of the inputs:

Image classification
Objects detection
Image segmentation

We can work with SoA architectures or design our own

We can give a better understanding AI:

XAI techniques
Objectifying the XAI
Calibrating AI models

17 of 17

Workshop DATAI-IESE

May 6^th, 2024

Diego Borro

Computer Science PhD

Vision and Robotics research line at CEIT

Research Professor at TECNUN

dborro@ceit.es

AI techniques for image processing