1 of 25

⚡️ Lightning Talks #1 🚀

one slide in one minute, let's go!

2 of 25

2.73× −7.63× speedup while retaining 98.6% − 99.6% of

the original accuracy!

Goal: To accelerate the self-attention computation especially when sequence length is long.

3 of 25

4 of 25

Safe Learning in the Real World via Adaptive Shielding with Hamilton-Jacobi ReachabilityMichael Lu, Jashanraj Singh Gosain, Luna Sang, Mo Chen

Problem: Standard RL algorithms can be unsafe during learning since they might violate safety constraints while learning

Solution: Construct a "safety filter" using Hamilton-Jacobi Reachability that works with any off-policy RL algorithm

Contribution: Our shield adapts to model uncertainty. When �the real system differs from our model, the safety filter �becomes more conservative

Results: Fewer safety violations compared to fixed safety �filters that don't adapt. Successfully tested on navigational �tasks with minimal human intervention

Michael Lu

5 of 25

3D visual grounding models perform well on existing datasets…

…but do these datasets really capture the full range of visual grounding descriptions?

ScanRefer

there is a microwave on a countertop. it is in the corner.

ViGiL3D

Grab the box on the counter that appears taller than the rest.

Come learn about how ViGiL3D helps us understand and improve 3DVG models through diverse language!

ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Grounding

6 of 25

Eye-Opening Open-Set Adaptation Analysis

Test-time Adaptation:

Open-Set Experiments:

Continual Adaptation:�changing shifts

data from unknown classes:�houses in testing data, outdoors in training data

Zefeng Li12, Evan Shelhamer12

UBC1 Vector Institute2

first train model on�standard/clean data

example test data: image corruptions like noise, blur, weather, digital

Closed-set (In Distribution = InD)

then adapt on test data�that it is new and different

Open-set (Out of Distr. = OOD)

How to make models adapt to unknown shifts and classes?

100% InD

100% OOD

Batch Mixtures:

more / less InD vs. ODD

time

Snow/InD

Brightness/InD

Contrast/OOD

7 of 25

8 of 25

Existing generative models fail on small datasets

Checkout out our poster: Rejection Sampling IMLE

Dataset

Diffusion models

GAN

RS-IMLE (ours)

Toy 2D

FFHQ 100

9 of 25

Adaptive Randomized Smoothing:

Certified Adversarial Robustness for Multi-Step Defence

Saiyue Lyu1*, Shadab Shaikh1*, Frederick Shpilevskiy1*, Evan Shelhamer2, Mathias Lécuyer1

University of British Columbia1 Google DeepMind2

  • Reconnect RS to Differential Privacy (DP) theoretically
  • Adapt RS to inputs during testing with theoretically-sound updates.
  • Improve test accuracy for image classification
  • Enlarge certification radius

CelebA Result

Paper Code

10 of 25

Adaptive Diffusion Denoised Smoothing :

Certified Robustness via Randomized Smoothing with

Differentially Private Guided Denoising Diffusion

Frederick Shpilevskiy1, Saiyue Lyu1, Krishnamurthy Dj Dvijotham2, Mathias Lécuyer1, Pierre-Andre Noel2

  • Model diffusion as a sequence of GDP mechanisms with data-adaptive variance.
  • Extend Adaptive RS with privacy filters to provide an end-to-end certification analysis.
  • Improve test accuracy for image classification.

PUT Workshop Oral

1

2

11 of 25

12 of 25

13 of 25

14 of 25

ICML 2025: Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry

Sparse Training and Lottery Ticket Hypothesis (LTH):

Method:

"We found that LTH masks fail to generalize to new random initializations due to loss basin misalignment. To reuse an LTH mask with a different random initialization, we leverage permutation symmetries, to permute the mask to align with the new random initialization optimization basin."

Source: Visualization of a 2D loss landscape; of (left) dense training & pruning. (right) permuting the mask enables sparse training.

Result:

sparsity = 0.80

sparsity = 0.90

sparsity = 0.95

sparsity = 0.97

We find that a sparse model (with permuted mask) with new random initialization, can nearly match LTH solution performance.

LTH showed that there exists a sparse sub-network that can match dense performance.

However, finding LTH sparse mask is computationally expensive and doesn’t work with new random inits.

However, finding LTH sparse mask is computationally expensive and doesn’t work with new random inits.

Research Question:

How can we train a LTH mask from a different random initialization while maintaining good generalization?

Research Question:

We show that by leveraging weight symmetries LTH mask can be used with new random inits.

15 of 25

Chehre: Understanding the Language of Facial Expressions

Bita Azari, Zoe Stanley, Avneet Batra, Poorvi Bhatia, Hali Kil, Manolis Savva, Angelica Lim

Simon Fraser University (SFU)

Flattered

saddened

I’m mad!

What does this emoji mean to you?

Please express the emoji

Contact Me: bazari@sfu.ca

16 of 25

What can satellite imagery and machine learning measure?

Results

  • Imagery explains 45% of the spatial variation in ground measurements across all 115 experiments. Predictive power is notably high for many variables that have never before been predicted at scale from imagery, such as vehicle ownership (R2=0.66), agricultural involvement (R2=0.62), and life expectancy (R2=0.72).
  • SIML technology is equally valuable for monitoring human systems as it is for natural systems. While historically emphasis has been placed on remotely sensing natural systems, we show that socio-economic variables are, in general, just as clearly visible from imagery as environmental variables.
  • Direct visual representation is critical for SIML technology’s success. Variables that are directly represented in the imagery (e.g., forest cover), are easier to predict -- especially when they are geographically far from training data -- than variables that are indirectly represented (e.g., income).
  • SIML success varies systematically with income and population density, underscoring inequities in access to and in the likely development of remote monitoring systems that are useful in practice.

Jonathan Proctor, Tamma Carleton, Trinetta Chong, Taryn Fransen, Simon Greenhill, Jessica Katz, Hikari Murayama, Luke

Sherman, Jeanette Tseng, Hannah Druckenmiller, Solomon Hsiang

Approach: We conduct 115 standardized large-scale experiments using a composite high-resolution optical image of Earth and a generalizable and accessible SIML technology to evaluate which ground conditions can be accurately measured and where this technology struggles.

17 of 25

Reliable ML against Faulty Training Data

Label: Stop sign

Problem

Label: Pneumonia

Medical

Insight: Diversity increases reliability!!

AVs

Our Solutions

Results

Dynamic weights using XAI

Ensembles

Diversity-guided Ensemble Search

Most resilient,

minimal effort

28% more resilient

24% more resilient

Training Data

Abraham Chan, UBC (abrahamc@ece.ubc.ca)

18 of 25

Lifelong Learned Video Diffusion Models Work

  • Task: Future frame generation by streaming training on one million frames long video�

  • Algorithm: Experience replay
  • Experiments: Four curated video datasets

Yoo, J., He, Y., Naderiparizi, S., Green, D., Ven, G.M.V.D., Pleiss, G., Wood, F. (2024). Lifelong Learning of Video Diffusion Models From a Single Video Stream. arXiv preprint arXiv:2406.04814.

Dataset 1

Ground Truth

Sample 1

Sample 2

Dataset 2

Paper

19 of 25

We Define and Analyze Memorization in Novel Models

20 of 25

When Backtracking isn’t Reasoning

Yunpeng Liu1,2 Berend Zwartsenberg1 Frank Wood1,2,3

Do imitative backtracking reasoning models improve reasoning scores by making corrections?

  • We found evidence that the backtracking reasoning model is making use of additional computation while effectively ignoring the contents of its “mistaken reasoning steps.”

Loss mask for imitative backtracking

Random replacements yield similar performance as model-generated human-like mistaken reasoning steps.

The reasoning score increases as the number of random reasoning attempts followed by BACK tokens increases.

Inverted AI1, University of British Columbia2, Amii3

yunpengl@cs.ubc.ca

21 of 25

Visual-concepts driven Image Generation

22 of 25

  • Time-aligned, video, audio, action data
  • 10000+ hours
  • 10000+ players across the globe
  • Social interactions
  • Continuous world state

How can we create embodied artificial intelligence agents capable of interacting naturally and purposefully with humans in complex, open-ended environments?

Ongoing: Developing real-time EAI agents

  • Action-conditioned World Model
  • Agents
  • Diffusion model with long history
  • VLM
  • ……

PLAICRAFT.AI

(Join & Plai!)

https://blog.plaicraft.ai/

23 of 25

79%

48%

24%

Only 24% of Canadians received any AI training

    • Inspired by LISA’s proven ARENA

    • 4-6 week immersive cohorts

    • Hands-on AI safety engineering skills, safety audits & interpretability

    • Includes human-centric/data-centric AI and indigenous perspectives

    • Helps build Canadian technical sovereignty in AI

    • Developing AI and technical AI safety talent for academia, industry and government

    • Fits into the Pan-Canadian Artificial Intelligence Strategy, Pillar 3: Talent and Research

Situation

Solving Canada’s Technical AI Safety Talent Gap

Our Proposal

Strategic Benefits

79% of Canadians are concerned about the negative outcomes of AI

Canada ranked 44th out of 47 in AI literacy

Almost half of managers and executives feel their employees are not/barely prepared to use AI

44

47

/

Cole Thacker

cole_thacker@sfu.ca

24 of 25

Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow

25 of 25

Anthony Fuller�Carleton PhD student�+ Vector intern��LookWhere�Poster Session #1

LookWhere: Efficiencywith Self-Supervised Adaptive Computation

A. Fuller*1, Y. Yassin*1, J. Wen1, D.G. Kyrollos1,�T. Ibrahim1, J.R. Green1, E. Shelhamer123

Carleton1 UBC2 Vector Institute3

Multi-Modeling for Efficiency, Adaptivity, and Uncertainty

Tim G. Zhou�UBC MSc student�(incoming)��Asymmetric Duos�Away—another time!

T.G. Zhou1, E. Shelhamer12, G. Pleiss12

UBC1 Vector Institute2

Asymmetric Duos: Uncertaintywith the Help of a Sidekick

Zefeng Li�UBC PhD student�+ Vector student��Open-Set Adaptation�Poster Session #1

Zefeng Li12, E. Shelhamer12

UBC1 Vector Institute2

Open-Set Updates: Adaptivitywith More Analysis and Benchmarks

known classes + unknowns.