2 of 77

Segmentation

Segment images into regions with different semantic categories.
These semantic regions label and predict objects at the pixel level

Image from http://d2l.ai/

3 of 77

Microstructure Segmentation

Structure-property linkages

Juwon Na, Se-Jong Kim, Seong-Hoon Kang, Heekyu Kim and Seungchul Lee*, 2022, "A Unified Microstructure Segmentation Approach via Human-In-The-Loop Machine Learning," Acta Materialia

Given input

Fully supervised

Weakly-supervised

4 of 77

Microstructure Segmentation

Structure-property linkages

Juwon Na, Se-Jong Kim, Seong-Hoon Kang, Heekyu Kim and Seungchul Lee*, 2022, "A Unified Microstructure Segmentation Approach via Human-In-The-Loop Machine Learning," Acta Materialia

Given input

Fully supervised

Weakly-supervised

5 of 77

Weakly Supervised Learning (WSL): Phase Segmentation

Segmentation result

Scribble annotation

Juwon Na, Se-Jong Kim, Seong-Hoon Kang, Heekyu Kim and Seungchul Lee*, 2022, "A Unified Microstructure Segmentation Approach via Human-In-The-Loop Machine Learning," Acta Materialia

6 of 77

Weakly Supervised Learning (WSL): Phase Segmentation

Segmentation result

Scribble annotation

Juwon Na, Se-Jong Kim, Seong-Hoon Kang, Heekyu Kim and Seungchul Lee*, 2022, "A Unified Microstructure Segmentation Approach via Human-In-The-Loop Machine Learning," Acta Materialia

7 of 77

WSL + Active Learning

Iteration 1

Iteration 2

Iteration 3

Annotated Image

Segmentation

Juwon Na, Se-Jong Kim, Seong-Hoon Kang, Heekyu Kim and Seungchul Lee*, 2022, "A Unified Microstructure Segmentation Approach via Human-In-The-Loop Machine Learning," Acta Materialia

8 of 77

WSL + Active Learning

Iteration 1

Iteration 2

Iteration 3

Annotated Image

Segmentation

Juwon Na, Se-Jong Kim, Seong-Hoon Kang, Heekyu Kim and Seungchul Lee*, 2022, "A Unified Microstructure Segmentation Approach via Human-In-The-Loop Machine Learning," Acta Materialia

9 of 77

WSL + Active Learning

Iteration 1

Iteration 2

Iteration 3

Annotated Image

Segmentation

Juwon Na, Se-Jong Kim, Seong-Hoon Kang, Heekyu Kim and Seungchul Lee*, 2022, "A Unified Microstructure Segmentation Approach via Human-In-The-Loop Machine Learning," Acta Materialia

10 of 77

Active Learning

Data set

Test set

Train set 1

Model 1

11 of 77

Active Learning

Data set

Test set

Train set 1

Model 1

Model 2

Train set 2

Model 3

Train set 3

Model n

Prediction

Train set n

By Human or Automated

12 of 77

Active Learning: Two Purposes

Active Learning to Build Good Surrogate Model (DOE)

Sampling unlabeled data that is expected to most significantly enhance performance of AI models
Active learning with Bayesian optimization (BO)

Active Learning to Find Better Solution (Optimization)

Identify unlabeled data that is expected to have highest label values

13 of 77

1) Active Learning to Build Good AI Model

14 of 77

Improving Model Efficiency

unlabeled

labeled

15 of 77

Improving Model Efficiency

unlabeled

labeled

16 of 77

Improving Model Efficiency

unlabeled

labeled

17 of 77

Improving Model Efficiency

unlabeled

labeled

18 of 77

Improving Model Efficiency

unlabeled

labeled

Looks very natural and easy
In high dimensional space, it is hard to imagine or visualize
Requires mathematical methods

19 of 77

Improving Model Efficiency

unlabeled

Looks very natural and easy
In high dimensional space, it is hard to imagine or visualize
Requires mathematical methods

20 of 77

Improving Model Efficiency

unlabeled

Looks very natural and easy
In high dimensional space, it is hard to imagine or visualize
Requires mathematical methods

21 of 77

Improving Model Efficiency

Selects the most informative data points to minimize labeling efforts while maximizing learning efficiency.

unlabeled

Looks very natural and easy
In high dimensional space, it is hard to imagine or visualize
Requires mathematical methods

22 of 77

Which Unlabeled Data Should We Sample?

23 of 77

Which Unlabeled Data Should We Sample?

Basic idea

24 of 77

Which Unlabeled Data Should We Sample?

Basic idea

25 of 77

Uncertainty-based Sampling

Key idea: Sample the unlabeled data that the AI model is least certain how to label

Measures to quantify uncertainty

Least confidence

Select the data with the lowest maximum probability

Margin sampling

Select the data that has the smallest difference between the first and second most probable labels

AI model’s prediction on two unlabeled dataset

Instances	Label A	Label B	Label C
d₁	0.9	0.09	0.01
d₂	0.2	0.5	0.3

26 of 77

Uncertainty-based Sampling

Key idea: Sample the unlabeled data that the AI model is least certain how to label

Measures to quantify uncertainty

Least confidence

Select the data with the lowest maximum probability

Margin sampling

Select the data that has the smallest difference between the first and second most probable labels

Instances	Label A	Label B	Label C
d₁	0.9	0.09	0.01
d₂	0.2	0.5	0.3

AI model’s prediction on two unlabeled dataset

27 of 77

Other Sampling Methods

Providing labels for unlabeled data that the AI model is least certain is not the only way to improve AI models’ performance

Query-By-Committee (QBC)

Core-Set

Expected Model Change

Involve committee of models that are trained on the same training dataset

Instead of relying on single AI model, use multiple AI models

For classification

Select the data that committees most disagree

For regression

Select the data with maximum variance

Instances	Model 1	Model 2	Model 3
d₁	Label A	Label B	Label C
d₂	Label A	Label A	Label C

28 of 77

Other Sampling Methods

Providing labels for unlabeled data that the AI model is least certain is not the only way to improve AI models’ performance

Query-By-Committee (QBC)

Core-Set

Involve committee of models that are trained on the same training dataset

Instead of relying on single AI model, use multiple AI models

For classification

Select the data that committees most disagree

For regression

Select the data with maximum variance

Instances	Model 1	Model 2	Model 3
d₁	0.81	0.75	0.77
d₂	0.52	0.82	0.91

Expected Model Change

29 of 77

Other Sampling Methods

Providing labels for unlabeled data that the AI model is least certain is not the only way to improve AI models’ performance

Query-By-Committee (QBC)

Core-Set

Select the data that would impact the most significant change to the current model if we knew its label

Significant ‘change’: Estimated as the gradient of the loss with respect to a candidate example

Expected Model Change

30 of 77

Other Sampling Methods

Providing labels for unlabeled data that the AI model is least certain is not the only way to improve AI models’ performance

Query-By-Committee (QBC)

Core-Set

Find ‘Core-Set’ that can cover the entire unlabeled data and query them

Core-Set: A set of data that can represent all data points within a distance of 𝛿 in the latent space

Expected Model Change

31 of 77

Active Learning

Let’s selectively label only the data that can improve the performance of AI models

Minimizing the number of labelling process while…
Maximizing the performance of AI model

Iteratively sample unlabeled data for labeling and update the training database

As iterations are repeated, the performance of the AI model improves

AI Model

Sampling

①

②

③

④

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

⑤

32 of 77

Step 1

Train AI model with training dataset

AI Model

Sampling

①

②

③

④

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

⑤

33 of 77

Step 2

Predict labels of unlabeled data with trained AI model

AI Model

Sampling

①

②

③

④

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

⑤

34 of 77

Step 3

Sample some of unlabeled data that would improve prediction performance of AI model

Number of unlabeled data to be sampled is up to the user
Various methods to measure which unlabeled data is effective in enhancing model performance exist

AI Model

Sampling

①

②

③

④

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

⑤

35 of 77

Step 4

Perform experiments or hire experts to label the sampled unlabeled data

AI Model

Sampling

①

②

③

④

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

⑤

36 of 77

Step 5

Add newly labeled data into the training dataset

Repeat step 1 to 5 until AI model reach required performance

AI Model

Sampling

①

②

③

④

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

⑤

37 of 77

Let’s Talk about Ensembles

38 of 77

Ensemble with Different Models

Ensembles: collections of predictors

Combine predictions to improve performance

Data set

Test set

Train set

Model 1

Result 1

Model 3

Model n

Result 2

Result n

Result 3

Prediction

Model 2

39 of 77

Ensemble with Different Training Sets

Bagging

parallel

Data set

Test set

Train set 1

Model

Result 1

Model

Result 2

Result n

Result 3

Prediction

Train set 2

Train set 3

Train set n

40 of 77

Ensemble with Different Training Sets

Boosting

sequential

Data set

Test set

Train set 1

Model 1

Model 2

Train set 2

Model 3

Train set 3

Model n

Prediction

Train set n

41 of 77

Space vs. Time

42 of 77

2) Active Learning for Optimization

43 of 77

Objective

Similar to traditional active learning in the sense that it involves sampling from unlabeled dataset, but…

Objective: Sample unlabeled data that is expected to have highest label values

Example) Sampling unlabeled process parameters for maximizing productivity

Purpose of Sampling

44 of 77

Which One is Better?

45 of 77

Which One is Better?

46 of 77

Which One is Better?

47 of 77

Which One is Better?

48 of 77

Active Learning for Optimization

Two important components

Surrogate model

Return predicted labels and uncertainties of such predictions

Utility function

Use the results of the surrogate model to measure which unlabeled data is more likely to have higher label values than currently known label values

Surrogate Model

Sampling with�Utility Function

①

②

③

④

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

⑤

49 of 77

Surrogate Model

“Surrogate” time-consuming and costly labeling processes by predicting the label values for each unlabeled data

It returns predicted labels and uncertainties of such predictions
Uncertainty values are necessary information for the utility function and, therefore, surrogate model is usually a probabilistic model

50 of 77

Surrogate Model

“Surrogate” time-consuming and costly labeling processes by predicting the label values for each unlabeled data

It returns predicted labels and uncertainties of such predictions
Uncertainty values are necessary information for the utility function and, therefore, surrogate model is usually a probabilistic model

51 of 77

Surrogate Model

“Surrogate” time-consuming and costly labeling processes by predicting the label values for each unlabeled data

It returns predicted labels and uncertainties of such predictions
Uncertainty values are necessary information for the utility function and, therefore, surrogate model is usually a probabilistic model

52 of 77

Surrogate Model

“Surrogate” time-consuming and costly labeling processes by predicting the label values for each unlabeled data

It returns predicted labels and uncertainties of such predictions
Uncertainty values are necessary information for the utility function and, therefore, surrogate model is usually a probabilistic model

53 of 77

Surrogate Model

“Surrogate” time-consuming and costly labeling processes by predicting the label values for each unlabeled data

Widely used probabilistic models: Gaussian process regression (GPR) & Bayesian neural network (BNN)

Gaussian Process Regression

GPR derives a distribution of functions that can map input to label values based on training data

Herein, the forms of these functions are defined by a kernel function

Bayesian Neural Network

Gaussian Process

54 of 77

Surrogate Model

“Surrogate” time-consuming and costly labeling processes by predicting the label values for each unlabeled data

Widely used probabilistic models: Gaussian process regression (GPR) & Bayesian neural network (BNN)

Gaussian Process Regression

BNN treat weights of neural network as distributions rather than fixed values

This allows the network to express uncertainty in its predictions by considering a range of possible weights

Bayesian Neural Network

55 of 77

Step 1

Train the surrogate model with training dataset

Surrogate Model

Sampling with�Utility Function

①

②

③

④

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

⑤

56 of 77

Step 2

Predict labels of unlabeled data with trained surrogate model

Obtain uncertainties of those predictions as well

Surrogate Model

Sampling with�Utility Function

①

②

③

④

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

⑤

57 of 77

Step 3

Use utility function to sample some of unlabeled data that would possess higher label values than currently known labels

Number of unlabeled data to be sampled is up to the user
Various utility functions to measure which unlabeled data is more likely to have higher label values than currently known label values exist

Surrogate Model

Sampling with�Utility Function

①

②

③

④

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

⑤

58 of 77

Step 4

Perform experiments or hire experts to label the sampled unlabeled data

Surrogate Model

Sampling with�Utility Function

①

②

③

④

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

⑤

59 of 77

Step 5

Check if there are any newly labeled data with label values higher than the previous ones and record those values
Add newly labeled data into the training dataset

Repeat step 1 to 5 until the highest label value reach predefined condition

Surrogate Model

Sampling with�Utility Function

①

②

③

④

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

⑤

60 of 77

How Utility Functions Decide Which Unlabeled Data to Sample?

61 of 77

Utility Function: Two Important Ideas

Exploitation: Searching the neighborhood of the point with the maximum function value among the points investigated so far in the next step
Exploration: Searching the neighborhood of the point with the maximum standard deviation
Strategies

Focusing solely on exploitation can lead to neglecting data from unexplored areas
Conversely, focusing solely on exploration can lead to neglecting data from areas that have demonstrated high label values
Using combination of exploration and exploitation as an utility function is a effective strategy to mitigate the trade-off

Agent

Exploitation

Exploration

62 of 77

Utility Function

Utility function strategically selects unlabeled data that is more likely to have higher label values than currently known label values using exploitation, exploration, or a combination of both

Probability of Improvement

Expected Improvement

Purely exploitative strategy

Upper Confidence Bound

Instances	Prediction	Uncertainty
x₁
x₂

63 of 77

Utility Function

Utility function strategically selects unlabeled data that is more likely to have higher label values than currently known label values using exploitation, exploration, or a combination of both

Probability of Improvement

Expected Improvement

Upper Confidence Bound

Instances	Prediction	Uncertainty
x₁
x₂

64 of 77

Utility Function

Utility function strategically selects unlabeled data that is more likely to have higher label values than currently known label values using exploitation, exploration, or a combination of both

Probability of Improvement

Expected Improvement

Balancing mean prediction and uncertainty

Favoring uncertainty under the assumption that higher uncertainty hides a potentially higher reward

Upper Confidence Bound

Instances	Prediction	Uncertainty
x₁
x₂

65 of 77

Utility Function

Utility function strategically selects unlabeled data that is more likely to have higher label values than currently known label values using exploitation, exploration, or a combination of both

Probability of Improvement

Expected Improvement

Upper Confidence Bound

Instances	Prediction	Uncertainty
x₁
x₂

66 of 77

Utility Function

Utility function strategically selects unlabeled data that is more likely to have higher label values than currently known label values using exploitation, exploration, or a combination of both

Fuse exploration strategy into probability of improvement (PI)

Weighting PI value by the difference between the current max value and the mean value
Probability of obtaining data with larger label value than the existing points is important, but it is also very important how large a value is obtained

Probability of Improvement

Expected Improvement

Upper Confidence Bound

Exploitation

67 of 77

Utility Function

Utility function strategically selects unlabeled data that is more likely to have higher label values than currently known label values using exploitation, exploration, or a combination of both

Fuse exploration strategy into probability of improvement (PI)

Weighting PI value by the difference between the current max value and the mean value
Probability of obtaining data with larger label value than the existing points is important, but it is also very important how large a value is obtained

Probability of Improvement

Expected Improvement

Upper Confidence Bound

Exploration

68 of 77

Expected Improvement (EI)

69 of 77

Expected Improvement (EI)

70 of 77

Expected Improvement (EI)

71 of 77

Expected Improvement (EI)

72 of 77

Expected Improvement (EI)

73 of 77

Expected Improvement (EI)

74 of 77

Expected Improvement (EI)

75 of 77

Expected Improvement (EI)

76 of 77

Expected Improvement (EI)

77 of 77

Summary

Main objectives of active learning with Bayesian optimization

Maximize the label value by labelling previously unlabeled data
Minimize the number of labelling process

Key idea: Sample unlabeled data that is expected to most significantly enhance label values

Choose right active learning framework to use