1 of 25

⚡️ Lightning Talks #1 🚀

one slide in one minute, let's go!

2 of 25

2.73× −7.63× speedup while retaining 98.6% − 99.6% of

the original accuracy!

Goal: To accelerate the self-attention computation especially when sequence length is long.

3 of 25

4 of 25

Safe Learning in the Real World via Adaptive Shielding with Hamilton-Jacobi Reachability�Michael Lu, Jashanraj Singh Gosain, Luna Sang, Mo Chen

Problem: Standard RL algorithms can be unsafe during learning since they might violate safety constraints while learning

Solution: Construct a "safety filter" using Hamilton-Jacobi Reachability that works with any off-policy RL algorithm

Contribution: Our shield adapts to model uncertainty. When �the real system differs from our model, the safety filter �becomes more conservative

Results: Fewer safety violations compared to fixed safety �filters that don't adapt. Successfully tested on navigational �tasks with minimal human intervention

Michael Lu

5 of 25

3D visual grounding models perform well on existing datasets…

…but do these datasets really capture the full range of visual grounding descriptions?

ScanRefer

there is a microwave on a countertop. it is in the corner.

ViGiL3D

Grab the box on the counter that appears taller than the rest.

Come learn about how ViGiL3D helps us understand and improve 3DVG models through diverse language!

ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Grounding

6 of 25

Eye-Opening Open-Set Adaptation Analysis

Test-time Adaptation:

Open-Set Experiments:

Continual Adaptation:�changing shifts

data from unknown classes:�houses in testing data, outdoors in training data

Zefeng Li¹², Evan Shelhamer¹²

UBC¹ Vector Institute²

first train model on�standard/clean data

example test data: image corruptions like noise, blur, weather, digital

Closed-set (In Distribution = InD)

then adapt on test data�that it is new and different

Open-set (Out of Distr. = OOD)

How to make models adapt to unknown shifts and classes?

100% InD

100% OOD

Batch Mixtures:

more / less InD vs. ODD

time

…

Snow/InD

Brightness/InD

Contrast/OOD

7 of 25

8 of 25

Existing generative models fail on small datasets

Checkout out our poster: Rejection Sampling IMLE

Dataset

Diffusion models

GAN

RS-IMLE (ours)

Toy 2D

FFHQ 100

9 of 25

Adaptive Randomized Smoothing:

Certified Adversarial Robustness for Multi-Step Defence

Saiyue Lyu^1*, Shadab Shaikh^1*, Frederick Shpilevskiy^1*, Evan Shelhamer², Mathias Lécuyer¹

University of British Columbia¹Google DeepMind²

Reconnect RS to Differential Privacy (DP) theoretically
Adapt RS to inputs during testing with theoretically-sound updates.
Improve test accuracy for image classification
Enlarge certification radius

CelebA Result

Paper Code

10 of 25

Adaptive Diffusion Denoised Smoothing :

Certified Robustness via Randomized Smoothing with

Differentially Private Guided Denoising Diffusion

Frederick Shpilevskiy¹, Saiyue Lyu¹, Krishnamurthy Dj Dvijotham², Mathias Lécuyer¹, Pierre-Andre Noel²

Model diffusion as a sequence of GDP mechanisms with data-adaptive variance.
Extend Adaptive RS with privacy filters to provide an end-to-end certification analysis.
Improve test accuracy for image classification.

PUT Workshop Oral

1

2

11 of 25

12 of 25

13 of 25

14 of 25

ICML 2025: Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry

Sparse Training and Lottery Ticket Hypothesis (LTH):

Method:

"We found that LTH masks fail to generalize to new random initializations due to loss basin misalignment. To reuse an LTH mask with a different random initialization, we leverage permutation symmetries, to permute the mask to align with the new random initialization optimization basin."

Source: Visualization of a 2D loss landscape; of (left) dense training & pruning. (right) permuting the mask enables sparse training.

Result:

sparsity = 0.80

sparsity = 0.90

sparsity = 0.95

sparsity = 0.97

We find that a sparse model (with permuted mask) with new random initialization, can nearly match LTH solution performance.

LTH showed that there exists a sparse sub-network that can match dense performance.

However, finding LTH sparse mask is computationally expensive and doesn’t work with new random inits.

Research Question:

How can we train a LTH mask from a different random initialization while maintaining good generalization?

Research Question:

We show that by leveraging weight symmetries LTH mask can be used with new random inits.

15 of 25

Chehre: Understanding the Language of Facial Expressions

Bita Azari, Zoe Stanley, Avneet Batra, Poorvi Bhatia, Hali Kil, Manolis Savva, Angelica Lim

Simon Fraser University (SFU)

Flattered

saddened

I’m mad!

What does this emoji mean to you?

Please express the emoji

Contact Me: bazari@sfu.ca

16 of 25

What can satellite imagery and machine learning measure?

Results

Imagery explains 45% of the spatial variation in ground measurements across all 115 experiments. Predictive power is notably high for many variables that have never before been predicted at scale from imagery, such as vehicle ownership (R²=0.66), agricultural involvement (R²=0.62), and life expectancy (R²=0.72).
SIML technology is equally valuable for monitoring human systems as it is for natural systems. While historically emphasis has been placed on remotely sensing natural systems, we show that socio-economic variables are, in general, just as clearly visible from imagery as environmental variables.
Direct visual representation is critical for SIML technology’s success. Variables that are directly represented in the imagery (e.g., forest cover), are easier to predict -- especially when they are geographically far from training data -- than variables that are indirectly represented (e.g., income).
SIML success varies systematically with income and population density, underscoring inequities in access to and in the likely development of remote monitoring systems that are useful in practice.

Jonathan Proctor, Tamma Carleton, Trinetta Chong, Taryn Fransen, Simon Greenhill, Jessica Katz, Hikari Murayama, Luke

Sherman, Jeanette Tseng, Hannah Druckenmiller, Solomon Hsiang

Approach: We conduct 115 standardized large-scale experiments using a composite high-resolution optical image of Earth and a generalizable and accessible SIML technology to evaluate which ground conditions can be accurately measured and where this technology struggles.

17 of 25

Reliable ML against Faulty Training Data

Label: Stop sign

Problem

Label: Pneumonia

Medical

Insight: Diversity increases reliability!!

AVs

Our Solutions

Results

Dynamic weights using XAI

Ensembles

Diversity-guided Ensemble Search

Most resilient,

minimal effort

28% more resilient

24% more resilient

Training Data

Abraham Chan, UBC (abrahamc@ece.ubc.ca)

18 of 25

Lifelong Learned Video Diffusion Models Work

Task: Future frame generation by streaming training on one million frames long video�

Algorithm: Experience replay
Experiments: Four curated video datasets

Yoo, J., He, Y., Naderiparizi, S., Green, D., Ven, G.M.V.D., Pleiss, G., Wood, F. (2024). Lifelong Learning of Video Diffusion Models From a Single Video Stream. arXiv preprint arXiv:2406.04814.

Dataset 1

Ground Truth

Sample 1

Sample 2

Dataset 2

Paper

19 of 25

We Define and Analyze Memorization in Novel Models

20 of 25

When Backtracking isn’t Reasoning

Yunpeng Liu^1,2 Berend Zwartsenberg¹ Frank Wood^1,2,3

Do imitative backtracking reasoning models improve reasoning scores by making corrections?

We found evidence that the backtracking reasoning model is making use of additional computation while effectively ignoring the contents of its “mistaken reasoning steps.”

Loss mask for imitative backtracking

Random replacements yield similar performance as model-generated human-like mistaken reasoning steps.

The reasoning score increases as the number of random reasoning attempts followed by BACK tokens increases.

Inverted AI¹, University of British Columbia², Amii³

yunpengl@cs.ubc.ca

21 of 25

Visual-concepts driven Image Generation

22 of 25

Time-aligned, video, audio, action data
10000+ hours
10000+ players across the globe
Social interactions
Continuous world state

How can we create embodied artificial intelligence agents capable of interacting naturally and purposefully with humans in complex, open-ended environments?

Ongoing: Developing real-time EAI agents

Action-conditioned World Model
Agents
Diffusion model with long history
VLM
……

PLAICRAFT.AI

(Join & Plai!)

https://blog.plaicraft.ai/

23 of 25

79%

48%

24%

Only 24% of Canadians received any AI training

Inspired by LISA’s proven ARENA

4-6 week immersive cohorts

Hands-on AI safety engineering skills, safety audits & interpretability

Includes human-centric/data-centric AI and indigenous perspectives

Helps build Canadian technical sovereignty in AI

Developing AI and technical AI safety talent for academia, industry and government

Fits into the Pan-Canadian Artificial Intelligence Strategy, Pillar 3: Talent and Research

Situation

Solving Canada’s Technical AI Safety Talent Gap

Our Proposal

Strategic Benefits

79% of Canadians are concerned about the negative outcomes of AI

Canada ranked 44th out of 47 in AI literacy

Almost half of managers and executives feel their employees are not/barely prepared to use AI

44

47

/

Cole Thacker

cole_thacker@sfu.ca

24 of 25

Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow

25 of 25

Anthony Fuller�Carleton PhD student�+ Vector intern��LookWhere�Poster Session #1

LookWhere: Efficiency�with Self-Supervised Adaptive Computation

A. Fuller*¹, Y. Yassin*¹, J. Wen¹, D.G. Kyrollos¹,�T. Ibrahim¹, J.R. Green^†¹, E. Shelhamer^†¹²³

Carleton¹ UBC² Vector Institute³

Multi-Modeling for Efficiency, Adaptivity, and Uncertainty

Tim G. Zhou�UBC MSc student�(incoming)��Asymmetric Duos�Away—another time!

T.G. Zhou¹, E. Shelhamer¹², G. Pleiss¹²

UBC¹ Vector Institute²

Asymmetric Duos: Uncertainty�with the Help of a Sidekick

Zefeng Li�UBC PhD student�+ Vector student��Open-Set Adaptation�Poster Session #1

Zefeng Li¹², E. Shelhamer¹²

UBC¹ Vector Institute²

Open-Set Updates: Adaptivity�with More Analysis and Benchmarks

known classes + unknowns.