1 of 11

1

Project Website

2 of 11

3 main challenges in current time series benchmarks

2

Multimodal data w/ asynchronous timestamps

Regular Unimodal

Multivariate Time Series

Irregular Multimodal Multivariate Time Series

Irregular timestamps in TS

What’s the cause?

Regular timestamps in TS

Unimodal data

Regular-only assumptions → unrealistic in practice

Multimodal integration with synchronous timestamps �→ ignores asynchrony

No understanding of irregularity causes → limits interpretability

3 of 11

Time-IMM solves the absence of realistic, cause-driven irregular multimodal time series benchmarks

3

9 multimodal (numerical + text) real datasets capturing distinct causes of irregularity

A unified multimodal forecasting library (IMM-TSF)

Modular fusion strategies for asynchronous numerical–text data

Empirical proof that modeling multimodality under irregularity yields robust forecasting gains.

4 of 11

Time-IMM: Dataset for Irregular Multimodal Multivariate Time Series

4

Real-world irregularities arise from three fundamental causes, each with unique modeling challenges.

Trigger-Based: Observations occur only when external events or internal triggers happen.

Constraint-Based: Sampling limited by operational schedules, resource availability, or human timing.

Artifact-Based: Irregularity caused by system faults, delays, or multi-source asynchrony.

5 of 11

Dataset Construction Pipeline�

5

Numerical Data

Real-world time series for each irregularity type
Preserve native timestamps (no resampling)

Textual Data

Collect relevant reports, logs, or notes linked to each dataset
Filter & summarize using GPT-4.1 Nano
Retain original timestamps for text entries

Multimodal Integration

Combine numerical and textual data while preserving asynchronous timestamps

6 of 11

Problem Formulation: Irregular Multimodal Multivariate Time Series Forecasting

6

Predict future time series values using irregularly sampled numerical data and asynchronous textual context.

7 of 11

IMM-TSF: A Benchmark Library for Irregular Multimodal Multivariate Time Series Forecasting

7

Timestamp-to-Text Fusion (TTF)

RecAvg: recency-weighted aggregation of past text embeddings
T2V-XAttn: Time2Vec-augmented cross-attention for temporal relevance

Multimodality Fusion (MMF)

GR-Add: GRU-gated residual addition for adaptive text influence
XAttn-Add: cross-attention addition between numerical and textual features

plug-and-play

8 of 11

Effectiveness of Multimodality�

8

Across all nine Time-IMM datasets, incorporating textual information consistently improves forecasting accuracy compared to unimodal (numerical-only) models.

Average MSE reduction: 6.7%

Maximum improvement: 38.4% in datasets with highly informative text

9 of 11

Multimodal Forecasting Analysis�

9

Gains Across Datasets

Multimodal models outperform unimodal baselines on all datasets, with larger gains when text provides strong contextual signals (e.g., ClusterTrace).

Fusion Strategies

GR-Add gives the most stable and accurate results; both RecAvg and T2V-XAttn perform similarly.

Frozen LLM Backbones

Text encoder choice has limited effect — forecasting depends more on temporal alignment than on large-scale language understanding.

a)

b)

c)

Here we take a closer look at the multimodal forecasting results.

Starting with panel (a), we can see that multimodal models outperform their unimodal counterparts across all nine datasets.�The improvements are especially large when the textual information carries strong contextual signals — for example, in ClusterTrace, where system logs directly describe workload behavior that’s highly relevant to future usage.

Panel (b) compares different fusion strategies in IMM-TSF.�Overall, the GR-Add module provides the most stable and accurate results, thanks to its adaptive gating mechanism that controls how much the text influences the forecast.�Both RecAvg and T2V-XAttn perform similarly, suggesting that recency and temporal alignment are both effective ways to handle asynchronous text.

Finally, panel (c) evaluates different frozen LLM backbones for the text encoder.�Interestingly, the choice of language model — whether GPT-2, BERT, or Llama — has only a minor effect.�This shows that forecasting in irregular multimodal settings depends more on temporal alignment and context timing, rather than on large-scale semantic understanding from the LLM.

10 of 11

Open Source�

10

Code is available at https://github.com/blacksnail789521/IMM-TSF

Dataset is available at https://github.com/blacksnail789521/Time-IMM

Project Website

11 of 11

Thank You!

chingchang0730@ucla.edu

11

Project Website